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To the Reader 


This book is designed for the individual who wants to learn the basic concepts and 
procedures of statistics as it is used today for educational, professional, or personal 
reasons. You may be interested in statistics as part of a liberal education. You may 
be a student of or employed in the field of education, engineering, or one of the 
social, biological, or physical sciences. The methods taught here are common to all 
these fields. You will find examples of the application of statistical techniques to 
each of these fields in the book. 

This is a “how to” book. It will teach you how to perform a number of useful 
statistical tests and procedures. It will teach you how to determine which of these 
tests and procedures are appropriate under given circumstances. Although you will 
learn the basic concepts on which the tests are based, you will not learn the 
mathematical derivation of every formula nor all the possible variations of every 
test. The book assumes only a knowledge of elementary algebra—enough to 
substitute numbers for symbols in a formula and to keep the + and — signs 
Straight when you are working with positive and negative numbers. 

Computers are generally used for statistical calculations when large amounts of 
data are analyzed and extensive, repetitive calculations are required. In this book, we 
provide examples of the output of computer programs for statistical analysis. 
However, we emphasize the kinds of statistical work that can be done “by 
hand”—problems with small amounts of data and relatively simple computations. 
Such problems can be completed with a pencil and paper or a simple electronic 
calculator. You will be able to use these techniques for yourself in everyday work 
and study. Since the computer techniques are an extension of the same basic 
methods taught here, this book will also help prepare you to become a more 
sophisticated consumer of these services. 

As you work through this book, remember that the results of statistical tests and 
procedures are only as meaningful as the data on which they are based. The 
problems of measurement and experimental design are beyond the scope of this 
book and are touched on only in passing. In many cases you will have to apply 
knowledge of your field from other sources when you begin to put your statistical 
skills to use. At the end of the book is a test that you may use as a final 
examination to evaluate your progress. It includes material from all chapters. If 
you successfully complete the review problems for each chapter and the test, you 
may be confident that you have mastered the material in this book. 


DONALD J. KoosIs 
New York, New York 


How to Use This Book 


This dook is organized into numbered sections called frames. Each frame presents 
some new material, asks you to answer a question, and gives the correct answer. 
When you study the book, use an index card or note pad to cover the book’s 
answers. Slide the card down the page until you come to a dotted line. Write your 
answer: then move the card down to check your answer against the correct one. 
Most of the time you should be able to answer correctly. If you find that you have 
made an error, look back at the preceding material to make sure you understand 
the correct answer before you go on. | 

This is not a book you can read through quickly. Statistics, like other mathemati- 
cal subjects, requires concentration. Each chapter represents about two hours’ 
work. If possible, try to complete a chapter in one or two sessions and stop only at 
the end of a section or chapter. Frequent interruptions make the material more 
difficult to master. 

All the computations required in this book can be worked out by hand with 
reasonable effort; however, there is no special virtue in doing the arithmetic by 
hand. If possible, you should use an electronic calculator. The best choice is one 
that computes squares and square roots automatically. (It will have keys labeled x? 
and Vx.) If you have a programmable calculator, work through the first few 
problems of each type step by step before you try to use the programming features. 

If your interest is primarily in understanding the steps involved in a test, and you 
have only limited patience for arithmetic, you may choose to skip some of the 
computations. When you make this choice, be sure nevertheless that you set up the 
problem and compare your work, as far as it goes, with the solution in the answer. 

At the end of each chapter is a set of review problems. These problems not only 
review the content of the chapter; they also relate it to other material you have 
learned. Working these problems is an important part of the study process. Do not 
skip them. 

At the end of the book is a test that you may use as a final examination to 
evaluate your progress. 
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STATISTICS 


CHAPTER ONE 


Basic Skills 


This chapter is about how to summarize data. When you perform an experiment, 
conduct a survey, or collect information, you usually wind up with a number of 
Observations. For example, you have the test scores of 27 classmates, or the 
opinions of 12 jurors, or the height of 100 redwood trees. Your first problem is to 
summarize this information in some way that lets you generalize—that lets you see 
the forest as well as the individual trees. 

One common and useful way to summarize a group of observations is to draw a 
graph—a frequency-distribution graph. Another way is to compute some sort of 
“average” that describes a typical observation—a measure of central tendency. 
Three commonly used measures of central tendency are the mean, the median, and 
the mode. 

It is also useful to describe numerically how much the observations differ from 
one another—what we call their variability. Three common measures of variability 
are the range, the standard deviation, and variance. 

This chapter will teach you the most common methods of summarizing data. 
When you have completed this chapter you will be able to: 


@® construct a frequency-distribution graph; 
@ recognize and apply some common vocabulary to describe frequency distribu- 


tions; 

@ compute three measures of central tendency: the mean, the median, and the 
mode; 

@ compute three measures of variability: the range, the standard deviation, and the 
variance. 
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HOW TO CONSTRUCT A FREQUENCY-DISTRIBUTION GRAPH 


The frequency distribution is a useful summary of most kinds of data. A frequency 
distribution sorts observations into categories and describes how often observations 
fall into each category (either as a number or a percent). Very likely you have used 
frequency distributions to analyze the data of your own daily life. 


I. 


Consider the following data: 
Ann has green eyes Alice has blue eyes 
Jane has brown eyes Judy has blue eyes 
Marie has brown eyes _ Sue has brown eyes 
Joan has blue eyes Carol has brown eyes 


Victoria has brown eyes Barbara has brown eyes 


How would you summarize these data? 


You probably used a table like this: 


brown eyes 6 or 60% 
blue eyes 3 or 30% 
green eyes 1 or 10% 


You might also have drawn a frequency-distribution bar graph like the one 
below. 


Percent 


Brown Blue Green 
Color of eyes 


Frequency distnbution of color of eyes for 10 women. 
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A bar graph like the one opposite is a useful summary of the data in question. 


This bar graph shows the diate) ew on gee be 
of eye colors. 


meee i i i si i | i | lc 


frequency distribution 


If you used a computer to analyze your data, you could obtain a printout that 
looks something like this: 


VALUE LABEL FREQUENCY PERCENT 


BROWN 6 60.0 
BLUE 3 30.0 
GREEN 1 10.0 


Your printout might also be in graphic form, like this: 


VALUE ONE SYMBOL EQUALS APPROX 
COUNT LABEL | 1.00 OCCURRENCES 
6 BROWN eee eee 
3 BLUE +e * 
1 GREEN * 


Do these printouts contain any additional information not in the table and bar 
graph of frames 1 and 2? (Yes/No) 
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4. Suppose you are interested in the types of cars customers of a particular 
shopping center drive. You observe the following types of car in the parking 


lot on a given afternoon: 


4-door sedan 
station wagon 
hardtop 
sports car 
4-door sedan 
sports car 
station wagon 
sports car 
compact 
convertible 
antique car 
station wagon 
station wagon 


compact 
4-door sedan 
4-door sedan 
compact 
4-door sedan 
station wagon 
station wagon 
4-door sedan 
compact 
compact 
4-door sedan 
4-door sedan 
station wagon 


compact 
hardtop 
station wagon 
station wagon 
pick-up truck 
hardtop 
hardtop 
compact 
hardtop 
station wagon 
4-door sedan 
4-door sedan 


Prepare a frequency-distribution bar graph of these observations. 
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30 


20 
| | ; 


Station 4-door Compact Hardtop Sports Other 
wagon sedan car 


Automobile 


Percent 


Your graph should look something like the one above. As is customary, it 
measures the number of observations as a percentage along the vertical axis of the 
graph and shows the categories along the horizontal axis. The order in which you list 
the categories is not critical, although it helps the reader if there is some logic to the 
order; for example, here we have listed the categories from highest to lowest 
frequency. 
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= Often, instead of categories, the data we wish to summarize are measurements 
on a continuous scale; for example, length or time or temperature. Then we 
must group the measurements into categories; for example, look at the 
following raw data and the frequency distribution based on these tempera- 


tures: 
60.0° 62.0° 
60.0 62.0 
60.5 63.5 
61.0 63.5 
61.0 64.0 
61.0 67.5 
61.0 68.0 
62.0 69.5 
62.0 70.0 
62.0 10.0 
20 
S10 
0 
59.25- 61.25- 
61.25 63.25 


The first bar in t 


and 


70.0°  72.5° 
70.5 72.5 
71.0 73.0 
71.0 73.0 
11.5 73.0 
72.0 73.0 
72.0 73.5 
72.0 73.5 
72.5 73.5 
72.5 74.0 
63.25— 65.25— 
65.25 67.25 


74,5° 
74,5 
74.5 
75.0 
75.0 
75.0 
76.0 
76.5 
77.0 
79.0 


67.25— 69.25- 71.25- 73.25- 75.25- 77.25~ 
69.25 71.25 73.25 75.25 77.25 79.25 


Temperature in Degrees 


he graph above represents measurements between 
° 


—_— —_ _ —_ — —_ —_ —_ —_— — — —_—_— —_—_— —_— —_ —_ —_—_ —_—_ — 


59.25 and 61.25° 
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Are there any measurements that fall exactly on the dividing line between two 
categories? 


No 


Why do you suppose no measurements fall on the dividing line between two 
categories? 


(a) It was just luck. 
(b) We chose the categories with that in mind. 


i We chose the categories with that igri, 


How many categories are used to group the data? 
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9. The following illustrations (Graphs A, B, C) show three frequency-distribution 
graphs based on the same data. One is the original graph which uses 10 
categories, another uses 40 categories, and the third uses only three categories. 
Which one do you think presents the most useful summary of the data? 


20 


Percent 
— 
oO 


0 
59 61 63 #65 67 69 71 #73 75 #%77 79 
Graph A 


20 -- 


10 


Percent 


0 
60 61 62 63 64 65 66 67 68 6970 71 72 73 74 75 7677 78 79 80 
Graph B 


iy. ee ea ee, 
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For most purposes Graph A probably presents the most useful summary. 
Graph B is so detailed that the over-all pattern is not clear, whereas Graph C 
goes to the other extreme and obscures a difference that is large enough to be 
important. 
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10. As a rule of thumb, it is usually best to choose categories so that there are © 
between 10 and 15: categories in a distribution of measurements; for example, if 
you are preparing a distribution of test scores that range from 50 to 100 points, 
which set of categories would be preferred? 


(a) SO-52, 53-54, 55-56, etc.? 
(b) 50-59, 60-69, 70-79, etc.? 
(c) SO-54, 55-59, 60-64, etc.” 


(c) 50-54, 55-59, 60-64, etc. 


i1. To set the exact limits of categories consider the accuracy of your measure- 
ments; for example, the measurements in frame 4 were taken with a thermom- 
eter that was accurate only to the nearest half degree. By making the exact 
limits of the first category 59.25 and 61.25 we could be sure that no measure- 
ments would fall on the dividing line between two categories. 


If you are preparing a distribution of weights that are rounded to the nearest 
gram, which would be the better dividing line between two categories? 


(a) 5 gm? 
(b) 5.5 gm” 
(b)' 5.5 gm 
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g@ Using 10 categories, construct a frequency-distribution graph for the follow- 


ing data: 


ee 
aatas 
~ARDON 
NNN 
eS 
NANNAN 


i ee | 


— eZ oma — —_— 


Your graph should look something like this one. We used an exact limit for 


the first category of 1.05—1.55, but you might equally well have made the 
limits 0.95—1.45. 


Percent 


© 
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DESCRIBING FREQUENCY DISTRIBUTIONS 


There are some technical terms commonly used to describe frequency distributions. 
You will find it useful to know these terms: mode, bimodal, skewed, normal. 


13. Distributions of measurements can have various shapes. Some of the more 
common situations are illustrated below. Assume that each of these distribu- 
tions represents the times required by a group of 100 college students to solve a 
given Chinese puzzle. The time limit for each puzzle was half an hour. 


Puzzle A Puzzie B 
Skewed to left Approximately normal 
5 5 
Puzzle C Puzzle D 
Skewed to right Bimodal 
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14. 


15. 


16. 


For one of these puzzles there were two obvious approaches to solving it. About 
half the students started with the correct approach and solved it quite quickly, 
whereas the other half attempted the wrong approach first and took substan- 
tially longer. Which puzzle meets this description and what name is applied to 
the distribution? 


eee ae eee eae = = 


Puzzle D 
Bimodal 


For one of the puzzles there was one clear average time. About as many finished 
faster than average as finished slower than average and the distribution was bell 
shaped. Which puzzle fits this description and what name is applied to the 
distribution? 


Puzzle B 
Approximately normal 


The normal distribution has a precise mathematical definition and you will learn 
more about it later. 


Just working with one of the puzzles for a fixed amount of time automatically 
resulted in a solution. Most students simply kept Working on the puzzle until it 
solved itself at the end of this time, but a moderate number were able to come up 
with faster solutions. Which puzzle fits this description and what name is 
applied to its distribution? 


Puzzle A 
Skewed to the left 


The peak, or mode, of a distribution is the most common score; that is, the 
tallest bar in the graph. What is the mode of this set of data? 


Diy Ae Ay as Os Dy 0-0) D5 1 OO ID 


The mode ts 5, the most common score. 


17. 
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A peak in a distribution, as shown above, is called a mode. When a 


distribution has two peaks, it is called 


bimodal 
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18. Look at the distributions below: 


Skewed to left 


Skewed to right 


When there is a greater number of extreme scores on the left side of a 
distribution than cn the right side, we may say it is skewed to the (left/right). 
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19. Sketch a distribution that is skewed to the right. 
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20. Sketch an approximately normal distribution. 
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21. Sketch a bimodal distribution. 


20 BASIC SKILLS 


22. What is the mode of this distribution? 


wee eee 


37.7 — 37.8 
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23. What is the mode of this distribution? 


5‘—0” 5/-2?" 5/—4” 5/-6§” 5’-8” 5’—10” 6'-—0” 6-2" 


5 ft Sin 
and 
S ft 11 in. 
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24. Describe each of these distributions in a few words. 


(a) 


MEASURES OF CENTRAL TENDENCY. .-23 - 


(a) Approximately normal 
(b) Skewed to the left 
(c) Bimodal 


MEASURES OF CENTRAL TENDENCY 


Often, we want to summarize a whole distribution of measurements with a single 
number that is typical of the distribution. This kind of number is called a “measure 
of central tendency.” There are three measures of central tendency you should know 
about — they are the mean, the median, and the mode. Since you already know 
about the mode, you have only the median and the mean left to learn. 


25. | The median is the middle measurement in a set of measurements. 


To find the median you must 

(a) sort the observations in order of magnitude 

(b) then find the middle number 

For example, in the following set of 11 measurements, the sixth is the median: 


Phe DG AS. TST, 7 TT V9, 19, 19 19 


The median of this distribution is 


26. What is the median of the following set of measurements? 


2.0, 5.8, 6.1, 10.5, 53.9, 54.0, 78.6 


27. What is the median of the following set of observations? What is the mode? 


55 months, 50, 53, 54, 53, 54, 55, 56, 58, 60, 54, 54, 55, 58, 52, 54, 56, 57, 59 


Median. 

Mode, 

When there is an even number of observations, the middle of the list is halfway 
between two observations. The usual procedure is to call the halfway point the 
median; for example, 


£569. 110 
ah 


median = 7.5 
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Ig The mean is the value obtained by adding all the measurements and dividing 
by the number of measurements. 


For example, 


1.0 
2.0 
3.0 
6.0 
6.0 
7.0 
9.0 


34.0 


49 <——— mean 


7) 34.0 


For this series of observations, find the mean, median, and mode. (If you have 
an electronic calculator, use it.) 


400, 600, 800, 800, 850, 850, 850, 850, 850, 950, 1000 


Mean, 800 (8800/11) 
Median, 850 
Mode, 850 


Tes Let’s look at the formula for the mean. It looks a little alarming at first, but 
since you have already computed a mean you know that the process is simple. 


The formula for the mean is 


aX 
® 1 


In this formula 


su (Greek letter “mu’’) stands for the mean, 
x stands for each of the individual observations, 
» (summation sign) indicates the operation of summing all the values of x, 
n_ stands for the number of observations. 


We can read the formula ‘‘mu equals summation x over n.’’ Copy the formula 
once to get used to It. 
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31. 


32. 


33. 


1.0, 5.0, 5.3, etc., the duration of each call af 


_ Now let us apply the formula to some numbers. The following are the durations 
- in minutes of a series of telephone calls. You want to know the mean duration 


of the calls. Y 
1.0, 5.0, 5.3, 6.0, 6.0, 7.0, 7.7, 8.0, 8.1, 8.9 


XX 
L= — 
‘ n 


In this problem x has a series of different values. What are they? 


Xx is the sum of all the values of x. What is ©x in this case? 


10 (the number of observations you have for x) 


What is nu”? (x) 
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MEASURES OF VARIABILITY 


A measure of variability is a way of indicating how dispersed a set of observations is. 
The range of a distribution—the difference between the largest observation in the 
distribution and the smallest—is one crude measure of variability. The other 
measures of variability you will learn about in this book are the standard deviation 
and the variance. 

Even though a computer will often compute standard deviations and variances for 
you, we strongly suggest you use a calculator or pencil and paper to complete the 
computations in this section. Working through the process will help you remember 
the meaning of the formulas. 


34. Standard deviation and variance are measures of 


variability 


The basic formula for standard deviation is 


oe [5(x =i 


o (Greek letter ‘“‘sigma’”’) stands for the standard deviation. The other symbols 
should be familiar to you. If your math 1s a little rusty, don’t panic! We will 
walk through the formulas step by step. (If you have already seen a somewhat 
different formula be patient; we will get to it in a moment.) 


35. 


Let us apply the formula step-by-step to the following data: 


Step |. Find the mean, p (see shaded area 1). 

Step 2. Find (x—) for each of the different values of x. The parentheses 
indicate that this’ step comes first (see shaded area 2). 

Step 3. Find (x— 4)? for each of the different values of x and then find the 
sum of all these values of (x — )* (see shaded area 3). 

Step 4. Divide S(x —p)* by the number of different values of x (see shaded 
area 4). 

Step 5. Find the square root of [(x — p)’]/n. (Use a calculator or look it up 
in a table hike Table II in the appendix; see shaded area 5.) 


MEASURES OF VARIABILITY 


Now apply the same procedure to the following new data by completing the steps 
below: | 
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36. The reasons for using this particular formula as the usual measure of variability 
depend on mathematical considerations that are beyond the scope of this book. 
Other formulas are possible, but this one happens to be the most useful. To 
make sure you understand the formula for standard deviation, let us restate it in 
other terms. First, the expression (x — u) 1s simply the difference between each 
observation and the 


mean 


37. Once the differences from the mean are computed, the next step is to 


square them 


38. 


39. 


40. 


41. 
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The squared differences from the mean are then added up and divided by the 
number of observations; that is, they are 


averaged 


The final step is to 
take the square root 
“Standard devjation is the square root of the mean squared deviation from the 


mean.” True or false? 


True 


Compute the standard deviation of this set of numbers: 


Pee as es es 
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42. Variance is simply o”. If the standard deviation of a distribution is 2, what is 
its variance? 
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43. 


45. 


If the variance of a distribution is 25.00, what is its standard deviation? 


5.00 


You compute variance the same way that you compute the standard deviation, 
except that you skip the last step—taking the square root. So the formula for 


o* 1S 


There is another formula for o and o7 you will often see. It is mathematically 
equivalent to the one you have already learned, but it is easier to use for 
computation, since it avoids computing (x — pn) for each observation. 


2 = Lx? — (ZxP/n 
n 


Let us work through this formula once to give you more practice in interpreting 
statistical formulas. It is not necessary to memorize the formula. The observa- 
tions are: 


1, 2, 2, 2, 3, 4, 4, 4, S. 
Read the descriptions of the steps and complete the table to find o”. 


Step 1. Find Sx by adding all the values of x. Then compute (dx)? by 
squaring 2x and divide by the number of values of x to find 
(Sx) /n. Remember, perform the steps in parentheses first. 

Step 2. Find x? for each value of x and find the sum of all these values of x 
ax. 

Step 3. Compute Sx*—(2x)?/n and then divide the result by 7. 
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Complete the following computation: 


MEASURES OF VARIABILITY = 33 


47. 


48. 


49. 


It is important to keep the steps in the right order when using statistical 
formulas. 


(2x)? means “find the sum of the values of x, then square it.” 


xx? means “‘find the values of ___, then find 


the sum 


(Xx)? means 


“First —“‘;;CthnHen 


find the sum 
find the square 


=x? means 
“First Ci‘ WUCOUthhen 


find the square 


find the sum 


What does the symbol wu stand for? 


Variance 
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51. 


52. 


53. 


What does the symbol a stand for? 


Standard deviation 


For the following data find wu, «, o?, and the median. 
Data: 1.0, 1.0, 2.0, 2.0, 3.0, 5.0, 7.0 


The median is the fourth observation in order of magnitude—2.0. 


x (x — n) (x — wy 
l —2 +4 
l =. +4 
2 — | + | 
2 —1 + | 
3 0 0 
5 +2 +4 
7 +4 + 16 
Lx = 2) x(x — nw)? = 30 
2x 
me a 3 
pee EE 2S 49 
n 7 


pay a 999 50) 


n 


Read this computer output and fill in the appropriate information below. 


VARIABLE: AGE 

MEAN 37.16 MEDIAN 34.00 STD DEV 11.34 
VARIANCE 128.60 

VALID CASES 272 
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REVIEW PROBLEMS 


If you have successfully completed this chapter, you can now summarize data in a 
number of ways. You can: 


@ construct a frequency distribution; 

@ describe a frequency distribution as approximately normal, skewed or bimodal; 
@ compute the mean, median, and mode of your data; 

@ compute the standard deviation and variance of your data. 


Since you can perform these computations yourself, you also know what the 
computer has done when it presents you with a report of these numbers. 

Now try these review problems. Table I on pp. 253-256 lists any formulas you 
may need for reference. Table I is perforated so you can tear it out easily, if you 
wish. 


1. Construct a frequency distribution for the following data: 


O01 25 26 S11 53 #67 7.1 
730° 75 #75 89 99 10.1 11.3 
1h? 12.5 12.8 14.1 15.0 17.5 18.9 
19.8 21.7 244 249 


2. Sketch a distribution that is skewed to the right. 
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3. What is the median of the data in question 1? 


4. What is the mean of the data in question 1? 


5. What is the standard deviation of the following data? 
3, 8, 8, 8,9, 9, 9, 18 


REVIEW PROBLEMS 37 


Answers 


To review a problem, study the frames indicated after the answer. 
I. The intervals used for the following distribution are 0.05 to 2.55, 2.55 to 5.05, and 


so on. You may have used different intervals, but your graph should be similar to this 
one. See frames! to 12. 


0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 22.5 25.0 


Skewed to the right 


See frames 13 to 24. 
3. Median = 10.1 
See frames 25 to 27. 
Xx 285.2 
on 25 
See frames 28 to 33. 


v —_ 2 
\ oe ir ne Vo J15 = 3.87 


See frames 34 to 41. 


CHAPTER TWO 
Populations and Samples 


The distinction between a population and a sample is very important in statistics. A 
population includes all possible observations of a particular type. For example, if 
you measured the height of every tree in the forest, you would be measuring a 
population. A sample includes only some of the observations, but selected in a way 


cae i a observation an equal chance of being chosen. For 


ermie ee omen OL 


example, if you randomly chose 50 trees from a forest and measured their heights, 
you would be measuring a sample from_a population. 

Most of the techniques you will learn in this book have to do with using 
information about a sample to draw conclusions about the population, or vice 
versa. From information about a few randomly selected trees, you can draw some 
conclusions about the forest. Also, from information about the forest, you can 
draw some conclusions about a randomly selected group of trees. 

The tool you use to go back and forth between samples and population is a 
mathematical table called a sampling distribution. In this chapter, you will get a 
brief taste of how sampling distributions are mathematically derived. You will also 
learn how to use two of the most useful sampling distribution tables to draw 
conclusions about a sample on the basis of information about the population. 

When you have completed this chapter, you will be able to: 


@ distinguish between a population and a sample, 

@ use a sampling distribution, 

@ predict a sample proportion, using a binomial probability table. 
@ predict a sample mean using a normal probability table. 


POPULATIONS AND SAMPLES 


_A population is all conceivable observations of a particular type. A sample is a 
limited number of observations from a population, chosen in a way that allows every 


possible observation an equal chance of occurring. Using statistics, it is possible to 
make statements about what a population is probably like on the basis of informa- 


tion from a sample. It is also possible to make statements about what samples will 
probably be like on the basis of information about the population. 


1. In some cases, we use statistical techniques to go from information about a 
sample to information about a population. In other cases, we do just the 
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opposite and use information about a population to draw conclusions about 
the probable characteristics of a sample. For example, a researcher is interested 
in the learning ability of rats that receive a particular diet (she wants to 
compare the effect of this diet with that of another diet). There is no clear 
theoretical limit to the number of rats she could raise on the special diet, but 
for practical reasons she settles for measuring the learning ability of 50 rats. In 
this example the SO rats are the 


population 


The researcher will 


(a) use information about the sample to make statements about what the 
population is probably like, 

(b) use information about the population to make statements about what the 
sample is probably like. 


(a) 


On the basis of certain physical laws, an engineer has determined that one out 
of every 27 components of a given type will be defective. He wants to determine 
the probability of finding two or more defective components in a particular 
batch of five. 


In this case the population is 


all components 


The sample is 


ei ieee i i ei i i le 


the batch of five 
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10. 


The engineer will 


(a) use information about the sample to make statements about what the 
population is probably like, 

(b) use information about the population to make statements about what 
the sample is probably like. 


(b) 


Assume that the United States Census asks every inhabitant of the United 
States for his age and asks every 100th inhabitant for additional information 
about his education. The frequency distribution of ages obtained from these 
observations would be a (sample/population) distribution. 


mm ee eee eee Ce 


population (assuming no one was missed by the Census) 


The frequency distribution of education levels would be a (sample/population) 
distribution. 


sample 


In order to find out which of the four supermarkets in your neighborhood has 
the best prices, you compile a typical shopping list and price the items on the 
list at all four stores. To decide whether the figures you obtain are samples or 
populations ask yourself, ‘“‘Are the observations telling the complete story oram 
I just assuming that other observations will be similar?”’ Are these shopping 
list hgures populations or samples? 


Samples. The populations are all prices at each store; you are assuming that on 
other days and with other shopping lists you will obtain similar results. 


A teacher wants to know the ages of the children in his class. He looks up the 
age of each child in the school records. The information he obtains is a 
(sample /population). 


ce ei ie le 


population 


POPULATIONS AND SAMPLES 41 


11. A number used to summarize a population distribution is called a parameter. — 


12. 


13. 


14. 


15. 


A similar number used to describe a sample distribution is called a statistic; for 
example, if you are studying the population of the United States, the mean age 
of all inhabitants of the United States is a (parameter/statistic). 


parameter 


A researcher wants to estimate the average number of ladybugs per acre in 
Nebraska cornfields. To do so he counts the number of ladybugs in a large 
number of randomly selected one-acre plots. The mean number of ladybugs per 
acre in his sample is a (parameter/statistic). 


statistic 


The mean number of ladybugs per acre in Nebraska cornfields is a (parameter/ 
statistic). 


parameter. It describes the state of affairs in the population of all Nebraska 
cornfields, though, of course, we have no way of counting every ladybug in 
every acre. 


The mean of a sample distribution 1s a 


statistic 


The mean of a population distribution is a 


parameter 
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16. The standard deviation of a population distribution is a 


parameter 


17. The standard deviation of a sample distribution is a 


statistic 


As we noted earlier, a sample should be chosen in such a way that every possible 
observation has an equal chance of occurring. However, this requirement is often 
very difficult to meet. Researchers in various fields have developed specialized 
techniques to assure random selection of samples. As a novice, you will probably 
be on safe ground if you learn and use the accepted methods of sampling for your 
field. But you should be aware that the proper selection of samples is one of the 
most difficult areas in the application of statistics. As you work through this book, 
you will find notes pointing out some of the difficulties. For a deeper understand- 
ing of these issues, you will need to study experimental design. 


SAMPLING DISTRIBUTIONS 


The idea of a sampling distribution 1s the key to our ability to reason back and forth 
from populations to samples and from samples to populations. 


18. Let us develop a simple example of a sampling distribution. Take a coin out. 
You are interested in the proportion of times that the coin will come up heads 
when you flip it. We will call the event we are interested in a “success.” Flip 
the coin twice and note how often it comes up heads. Write your results here: 


first toss 
second toss 


The results of your two tosses are a (population/sample). 


19. What is the population? 


The result of all possible tosses. Note that in this case the population is un- 
limited. 
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20. Look back at your sample. What proportion (p) of the time did the coin come 


21. 


22. 


23. 


up heads? 
number of successes (heads) 
Pp ~~ number of tosses 7 
If your coin came up heads twice, p= 10 
If your coin came up heads once, p = 0.5 (or $) 


If your coin came up tails both times, p = 0.0 


Note that p must always be between 1.0 and 0.0. 


The value for p you just computed is a (parameter/statistic). 


statistic. It describes a sample. To distinguish between the parameter and the 
statistic, we use the capital letter P to identify the proportion of an entire 
population and the lowercase p to identify the proportion of a sample. 


Think about your general experience with flipping coins. If you could perform 
a similar computation for the population, what value for P would you obtain? 


P = 0.5. Assuming that you are not carrying trick coins, the coin should come up 
heads half the time and tails half the time. 


Take another sample of two tosses and compute p. 


cme mm i i i ee Ce Ce SCL ee Ce 


If your coin came up heads twice, = 10 
If it came up heads once, = 0.5 
If it came up tails both times, p= 0.0 
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24. 


25. 


26. 


Take three more samples. Remember asample consists of two tosses. You have 
now taken five samples of two tosses each. Summarize the results of your 
sampling below: 


Sample p 


AP wWNhd — 


Will the statistic p always equal 0.5? 


No. In fact it is unlikely that you found a value of 0.5 for all five samples. 


If you continue to take a large number of samples of two and you average the 
values of p that you obtain from each sample, what do you think will be the 
average value of p? 


The average value of p will be approximately 0.5. 


Can you construct a frequency distribution for the values of the statistic p 
that you obtain from a number of samples? 
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27. Draw a frequency distribution graph for the five values of p you have obtained 


from different samples. 


ee i 


Your frequency distribution should look something like the following graphs. 


The mode will probably be 0.5, but not necessarily. Your graph could, for 


example, look like any of the graphs below. 
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28. We can deduce mathematically what the frequency distribution of p would 
look like if you continued to take samples of two indefinitely. We do this by 
listing all possible results of two tosses and determining how probable each 
result is. First consider the first toss. What two results are possible? 


Heads or tails 


29. Is either possible result more likely than the other? 


30. Now consider the possible results of the second toss. What are they? 


Again, either heads or tails 


31. Will the results of the first toss have any influence on the second toss? 


32. Is either possible result of the second toss more likely than the other? 
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33. Wecan summarize the possible results of the two tosses in the following table: 


34. 


35. 


First toss Second toss 


(b) Heads on the first toss and tails on the second toss 
(c) Tails on the first toss and heads on the second toss 
(d) Tails on the first toss and tails on the second toss 


Is any of these results more likely than any of the others? 


No, each of the four results is equally likely. 


The possible results of the two tosses are 


(a) heads heads 
(b) heads tails 
(c) tails heads 
(d) tails tails 


Next to each result write the corresponding value of the statistic p. 


__ number of successes 


number of tosses (heads) 


(a) p= 1.0 
(b) p= 0.5 
(c) p=0.5 
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36. Draw a frequency distribution for the statistic p based on this mathematical 
analysis. 


Your distribution should look like this: 


Percent 


25 


0 0.51.0 


37. 


38. 
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If you continue to take samples of two and compute the statistic p, you will 
find that the frequency distribution of the values of p comes closer and closer to 
matching the theoretical distribution you have just deduced. If you feel so 
inclined, take another 15 samples of two tosses and compute values of p. 
Compare the actual distribution of the 20 values of p with the theoretical 
distribution shown by the dark line on the graph below. 


100% (20) 


50% (10) 


The distribution of a statistic is called asampling distribution. The distributions 


of p you have been working with are__—=—S——Sso distributions. 


sampling 
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39. Which of the distributions below (a, b, or c) is a sampling distribution? 
Which is a sample distribution? 
Which 1s a population distribution? 


30 
20 
10 


5 10 15 20 25 30 


Income (in thousands) 


Percent 


Figure a Distribution of incomes in a random sample of 1000 households. 


5 1.0 1.5 2.0 | 


Population (in millions) 


Number of counties 


Figure b Distribution of county populations in 1950 Census data. 


29 30 31 


Mean age of sample 


Percent 


Figure c Distribution of mean age in 1000 samples of 20 individuals each, 
drawn from a population of mean age 30. 


40. 


41. 


42. 


43. 
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Sampling distribution (c) This is the distribution of a statistic (the mean 
ages of 1000 samples). 
Sample distribution (a) This is the distribution of observations in a 


sample (1000 random households). 

Population distribution (b) This is the distribution of all possible observa- 
tions in the population (all counties in the 
US). 


The distribution of the individual observations in a sample is called a 


distribution. 


The distribution of all conceivable observations is called a 
distribution. 


population 


The distribution of a statistic is called a___ Ss "istributtion. 


sampling 


The median weight of the 20,000 members of a national health club is 175 |b. 
For the sake of this example we assume that no one weighs precisely 175 lb. 
We should like to know what the probability is that three members, selected at 
random, all weigh more than 175 lb. To find out we can use a method similar 
to the one we used to analyze the coin-tossing problem. First, in the population 
we are dealing with (members of the club) what proportion weighs more than 
175 |b? 


cei i ae 


0.5. Remember, the median is the point in the distribution at which half the 
measurements are higher and half lower. So, if half the population weighs 
more than 175 Ibs, 


_ 10,000 
P™ 40,000 


=0.5 
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44. 


45. 


46. 


47. 


We can use this information to construct a theoretical sampling distribution 
for samples of three drawn fram this population. What are the possible out- 
comes of selecting the first member at random? 


Over 175 
Under 175 


Is either more likely than the other? 


Draw a table like the one in frame 33 to show all possible results of selecting 
three members at random. 


First selection Second selection Third selection 
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First selection 


Over 175 


Second selection 


Over 175 
Under 175 
Under 175 Over 175 
Under 175 
Over 175 | Over 175 175 
Under 175 Under 175 
Under 175 Over 175 
Under 175 


Third selection 


Over 175 


48. Using the table, list all possible results of the three choices and compute the 
proportion that weighs over 175 lb for each possible result. That is, count the 
number of “successes” and divide by the number of trials. 


Result 
Over 
Over 
Over 
Over 
Under 
Under 
Under 
Under 


over 
over 
under 
under 
over 
over 
under 
under 


over 
under 
over 
under 
over 
under 
over 
under 


P 


ia ee ei ee ei ee ee CL CL 


Result 
Over 
Over 
Over 
Over 
Under 
Under 
Under 
Under 
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49. Draw a sampling distribution for p under these conditions. 


50. 


eee a i 


Percent 


0 33 66 1.0 


We are interested in how often we could expect to draw a sample of three 
members and find that all three weigh over 175 lb. What value of the statistic 
p would this situation correspond to? 


ome eae i iw = 


SI. 
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Look at your sampling distribution for p. What percent of the time would 
you expect to draw a sample that gave you a p of 1.0? 


mmm ae ie ieee ieee ee 


12.5°% of the time 


THE BINOMIAL PROBABILITY DISTRIBUTION 


The binomial probability distribution describes the sampling distribution of the 
statistic p for many possible values of P and many possible sample sizes. 


52. 


Using the same sort of reasoning you have just applied to the coin tossing and 
weight problems, it is possible to work out sampling distributions for p not only 
for various sizes of sample but also for cases in which P is some other value 
than 0.5; for example, suppose you know that in a large population one out 
of four is a college graduate. What is P in this case? 


P=0.25 or 1/4 
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33. 


5 


BINOMIAL PROBABILITIES 


x 0.050 0.100 0.200 0.25 0.300 0.400 0.500 0.600 0.700 0.750 0.800 0.900 0.950 


0 0.902 0.810 0.640 0.563 0.490 0.160 0090 0.063 0.040 0.010 0.002 
1 0.095 0.180 0.320 0.375 0.420 0.480 0.420 0.375 0.320 0.180 0.095 
2 0.002 0.010 0.040 0.063 0.090 0.360 0.490 0.563 0.640 0.810 0.902 


0 0.857 0.729 0.512 0.422 0.343 0.216 0.125 0.064 0.027 0.016 0.008 0.001 0.000 
1 0.135 0.243 0.384 0.422 0.441 0.432 0.375 0.288 0.189 0.141 0.096 0.027 0.007 
2 0.007 0.027 0.096 0.141 0.189 0.288 0.375 0.432 0.441 0.422 0.384 0.243 0.135 
3 0.000 0.001 0.008 0.016 0.027 0.064 0.125 0.216 0.343 0.422 0.512 0.729 0.857 


0 0.815 0.656 0.410 0.316 0.240 0.130 0.062 0.026 0.008 0.004 0.002 0.000 0.000 
1 0.171 0.292 0.410 0.422 0.412 0.346 0.250 0.154 0.076 0.047 0.026 0.004 0.000 
2 0.014 0.049 0.154 0.211 0.265 0.346 0.375 0.346 0.265 0.211 0.154 0.049 0.014 
3 0.000 0.004 0.026 0.047 0.076 0.154 0.250 0.346 0.412 0.422 0.410 0.292 0.171 
4 0.000 0.000 0.002 0.004 0.008 0.026 0.062 0.130 0.240 0.316 0.410 0.656 0.815 


0 0.774 0.590 0.328 0.237 0.168 0.078 0.031 0.010 0.002 0.001 0.000 0.000 0.000 
1 0.204 0.328 0.410 0.396 0.360 0.259 0.156 0.077 0.028 0.015 0.006 0.000 0.000 
2 0.021 0.073 0.205 0.264 0.309 0.346 0.312 0.230 0.132 0.088 0.051 0.008 0.001 
3 0.001 0.008 0.051 0.088 0.132 0.230 0.312 0.346 0.309 0.274 0.205 0.073 0.021 
4 0. | 156 0.259 0.360 0.396 0.410 0.328 0.204 


These sampling distributions are listed in a table called a binomial probability 
table. A small excerpt from such a table is given above. To use the table you 
must know what P is and what the sample size is. (The usual symbol for 
sample size is n). The sampling distribution is given in terms of the number of 
“successes” rather than values of the statistic p; for example, in the case of 
tossing coins P was 0.5 and the sample size n was 2. The appropriate sampling 
distribution is circled in the table. The number of “successes,” that is, the 
number of times the coin comes up heads, 1s indicated by x. According to the 
circled portion of the table, how frequently will there be two successes in a 


sample of two? 


0.25; that is, 25% of the time 


For the weight problem, what was P? Look back at frame 43 if you do not 
recall the problem. 


mma i i ee ee 
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55. What was n? 


n = 3(n 1s the sample size) 


56. Circle the appropriate sampling distribution in the table in frame 53. 


n x 0.050 0.100 0.200 0.250 0.300 0.400 0.500 0.600 0.700 0.750 0.800 0.900 0.950 


2 0 0.902 0.810 0.640 0.563 0.490 0.160 0.090 0.063 0.040 0.010 0.002 
1 0.095 0.180 0.320 0.375 0.420 0.480 0.420 0.375 0.320 0.180 0.095 
2 0.002 0.010 0.040 0.063 0.090 0.360 0.490 0.563 0.640 0.810 0.902 
3. 0 0.857 0.729 0.512 0.422 0.343 0.064 0.027 0.016 0.008 0.001 0.000 
1 0.135 0.243 0.384 0.422 0.441 0.288 0.189 0.141 0.096 0.027 0.007 
2 0.007 0.027 0.096 0.141 0.189 0.432 0.441 0.422 0.384 0.243 0.135 
3 0.000 0.001 0.008 0.016 0.027 0.216 0.343 0.422 0.512 0.729 0.857 
4 0 0815 0.656 0.410 0.316 0.240 0.026 0.008 0.004 0.002 0.000 0.000 
1 O.171 0.292 0.410 0.422 0.412 0.346 0.250 0.154 0.076 0.047 0.026 0.004 0.000 
2 0.014 0.049 0.154 0.211 0.265 0.346 0.375 0.346 0.265 0.211 0.154 0.049 0.014 
3. 0.000 0.004 0.026 0.047 0.076 0.154 0.250 0.346 0.412 0.422 0.410 0.292 0.171 
4 0.000 0.000 0.002 0.004 0.008 0.026 0.062 0.130 0.240 0.316 0.410 0.656 0.815 
5 0 0.000 0.390 0.390 0.237 0.168 0.078 0.031 0.010 0.007 0.001 


0.008 0.000 0.000 


57. You are told that in the population of all doctors 9 out of 10 recommend 
Potter’s Pills. Assuming this is true, you want to know the probability of 
choosing two doctors at random and finding that neither of them recom- 
mends Potter’s Pills. What are P and n for this case? 


P= 0.9 (8) 
n = 2.0 
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58. 


59. 


60. 


What value of x corresponds to the case in which neither of the two doctors 
recommends Potter’s Pills? (A doctor who recommended Potter’s Pills would 
be a “‘success.’’) 


How likely is it that neither doctor will recommend Potter’s Pills? Use the 
table in frame 53. 


The probability of this event is 0.010; that 1s, it should happen 1% of the time. 


Let us make the problem more complicated. Suppose you plan to choose four 
doctors at random and you want to know the probability that no more than two 
will recommend Potter’s Pills. To come up with an answer you must take into 
account the case in which none of them recommends Potter’s Pills (zero 
‘“successes’’), the case in which only one recommends Potter’s Pills and three 
do not (one “success”’), and the case in which two recommend Potter’s Pills 
and two do not (two “successes’’). You will find the probability of each of 


_ these cases in the sampling distribution; then you will add the probabilities 


for each of these cases. What are P and n for this problem? What is the 
probability that one of these three situations will exist? 


P = 0.9, n = 4. The probability that x is less than or equal to 2 is 0.053. To 
find this answer you had to add the probabilities of the three situations of 
interest. The probability that x = 0 is too small to enter in the table. The 
probability that x = 1 1s 0.004 and the probability that x = 2 is 0.049. 


61. 


62. 
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The total of all the probabilities in a sampling distribution 1s always 1.0. If the 
probability of no, one, or two recommendations for Potter’s Pills is 0.053, 
what is the probability of three or four recommendations? 


0.947 (1.000-0.053) 


Five percent of the population of the United States possesses a particular 
genetic trait. You want to know the chances of finding at least one person who 
has this trait in a random sample of 15. Use Table III in the back of the book. 


p = 0.05, n = 15. The probability is 0.537, or about 54%. The easiest way to 
find the answer is to find the probability of x = 0, that is, the probability that 
no one in the sample will have the trait. The probability that no one will have 
the trait 1s 0.463; therefore the probability that at least one person will have 
the trait is 1.000 — 0.463 = 0.537. 


You could also have added up the probabilities for x = 1, x = 2, x = 3, and 
so on. The sum of these probabilities from the table 1s 0.538. Because the table 
entries are rounded off, you will find occasional minor discrepancies like this. 
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63. 


65. 


When you use the binomial probability table, you are making an important 
assumption that you should be aware of. You are assuming that the observations 
in your sample are random and independent; that is, you are assuming that 
every observation in the population has an equal chance of being chosen at any 
step in the sampling process. Suppose you are going to choose three people 
out of a population of six that consists of three men and three women. You 
choose to select them one at a time. On the first draw you choose a man. Does 
this affect the chances of your choosing a man on the second draw? 


Yes, because there are now substantially fewer men in the remaining popula- 
tion. For the first draw P = 2 = 0.5; for the second draw P = 2 = 0.4. 


You have a jar containing six marbles, three red and three blue. You draw 
one marble at random and find that it is blue. You record this observation and 
replace the marble in the jar. Has your first observation affected the chances 
of choosing a blue marble on the second draw? 


No, because you replaced the blue marble. 


Can you use the binomial probability table to analyze the marble problem? 


Yes. The observations are independent. One observation does not affect the 
probability of the others. 


Can you use the binomial probability table to analyze the problem in frame 
63? 


No. The observations are not independent. 


Procedures for dealing with this problem exist, but they are beyond the scope of this 
book. 


67. 


68. 


69. 


70. 


vie 
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As arule of thumb, if the population is at least 20 times as large as the sample, 
you can disregard this effect of not being able to include one individual in your 
sample twice. If you are choosing a sample of six out of a population of 500, 
can you use the binomial probability table? 


Yes 


If you are choosing a sample of six out of a population of 75, can you use 
the binomial probability table? 


You recall that if nine out of 10 doctors recommend Potter’s Pills the prob- 
ability of selecting two doctors at random who do not recommend them is 
0.010. You go to a nearby medical school and ask the first two doctors you meet 
whether they recommend Potter’s Pills. Have all members of the population an 
equal chance of being chosen? 


No. The chance of choosing doctors connected with that medical school is 
much greater than the chance of choosing other doctors. 


Can you say that the probability of two ‘‘no”’ answers 1s 0.010? 


All members of the population have an equal chance of being selected at any 
point in the sampling procedure. 
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THE NORMAL DISTRIBUTION 


It is possible to deduce the sampling distributions of a number of other statistics 
mathematically. This book will not attempt to explain the mathematical reasoning 
involved in deducing these sampling distributions. Instead, our focus will be on how 
and why sampling distributions are useful in drawing conclusions about populations 
and samples. | 


72. One of the most useful sampling distributions is the one for the mean of a large 
sample of measurements. This distribution is called the normal distribution. 
Before explaining it, let’s review the symbols we will be using. 


73. 


74. 


75. 


pf. is the symbol for the 
population). 
o is the symbol for the 
population). 


mean of the population 
standard deviation of the population 


What does n stand for? 


Size of the sample 


u is a (parameter/statistic)? 


parameter 


The sample mean is a (parameter/statistic)? 


statistic 


of the (sample / 


of the (sample / 
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76. 


77. 


We represent the sample mean by the symbol x; for example, if the mean 
height of all women in the United States is 5 ft 4 in., and a sample of 10 women 


has a mean height of 5 ft 9 in. 


w= 

x= __ 
w=5ft 4in 
x= Sft 9in 


If we take a large number of samples and compute an X for each sample, we 
can also compute a mean for the X’s as well as a standard deviation. We call 


these values u; and o,. Complete the following table: 


Sampling Distribution 
Sample of Mean Population 


Sampling Distribution 
Sample | of Mean Population 


[Men Tee 


*You will learn more about s in Chapter 3. 
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78. 


Normal distribution. 


According to what is called the central limit theorem, the sampling distribution 
of x tends to have the shape of the normal distribution curve shown above, 


with 


6 
My =m and o,= ve 


When the sample size 7 is 30 or more, the sampling distribution is almost exactly 
like the illustration. Suppose we took a great number of samples of 36 individ- 
uals from a population with a mean age of 40 and a standard deviation of six 
years. What would the mean of the sampling distribution be? | 


19: 


80. 


81. 
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What would the standard deviation of the sampling distribution be? 


According to the illustration, what percent of the time would you expect the 
mean of one sample to be between 40 and 4]? 


What percent of the time would you expect the mean of one sample to be 
between 39 and 41? 


amc ee ew ae a 
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82. 
uz 75 


o = 64 


75 150 
Time 


According to the central limit theorem, the shape of the sampling distribution of 
x will be a normal curve, no matter what the shape of the population distribu- 
tion. The illustration above shows the population distribution for time to 
solve a particular puzzle for 6000 individuals. A large number of samples of 
49 individuals each are selected and x 1s computed for each sample. Which 
of the illustrations below will most resemble the sampling distribution of x? 


(c). No matter what the population distribution or the sample distributions look 
like, the sampling distribution of the statistic x will always have this bell shape. 
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83. 


85. 


Remember that og = —~. 
n 


What will the mean of the sampling distribution be? What will its standard 
deviation be? | 


The shape of the sampling distribution of X is called a normal distribution. In 
this normal distribution there is always 34.1% of the observations between 
uz and uz + ox, 34.1% of the observations between uz — oy and wz, 47.7% 
of the observations between uz and uz; + 20,, and so on. The exact shape 


of the distribution is completely defined by the two parameters and 


If we know how many standard deviations away from the mean an observation 
is, we can tell how likely it is to occur in random sampling. For this reason it is 
convenient to convert our measurements into z scores. A z score Is simply the 
number of standard deviations away from the mean a particular measurement Is 
located; for example, a score one standard deviation above the mean has az 
score of + 1; ascore one and one-half standard deviations below the mean has 
a z score of — 1.5. 


If a distribution has a mean of 15 and a standard deviation of 2, what z score 
corresponds to a raw score of 19? What z score corresponds to a raw score of 
14? 


— +2: that is, two standard deviations above the mean 
z = —O.5; that is, one-half standard deviation below the mean 
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a NN 


86. 


87. 


The formula for converting any measurement to a z score is 


Ifa = 15 and oy; = 2, what z score corresponds to a raw score of 20? 


The table that gives the normal frequency distribution is usually called ‘‘areas 
under the normal curve”’ and is organized in terms of z scores. If you want to 
know the probability of a particular ¥ from a known population, you must first 


compute a 


z score 
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88. AREAS UNDER THE NORMAL CURVE 


Area = probability 


An entry in the table is the proportion under the 
entire curve which is between z = O and a positive 
value of z. Areas for negative values of z are ob- 
tained by symmetry. 

Second decimal place of 2 


The table of areas under the normal curve gives you the probability of a sample 
mean between uw and uw + z; for example, if you want to know the probability of 
a sample mean between uw and uw + 1.00, you will look in the table under 
z = 1.00. The probability in this case is 0.3413. What is the probability of a 
sample mean between uw and uw + 1.50? 
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89. 


90. 


91. 


92. 


A population has a mean of 20 and a standard deviation of 4. You plan to 
choose a sample of 64 at random. What is the probability of a sample mean 
between 20 and 21? To answer this question you must find the mean and 
standard deviation of the sampling distribution and use them to convert 21 toa 
z score. 


_ What is the mean of the sampling distribution? What is the standard deviation 


of the sampling distribution? What z score corresponds to 21? 


Me = 20 
op = —- = 055 
Vn 
 2l-w_ tt _ 
: 1s 05 °° 


Use the table on page 269 to find the probability of a sample mean between 
20 and 21. 


Often you will have to add or subtract values you find in the table to answer 
a question; for example, what is the probability of a sample mean between 
19 and 21? Break this down into the probability of an X between 20 and 21. 


and the probability of an ¥ between 20 and 19. The answer is 


It helps to remember that the total of all probabilities in a sampling distribution 
is always 1.0000. If the probability of asample mean between 19 and 21 is 0.9544 
(i.e, 95°% of the time), what is the probability of a sample mean that is not be- 
tween 19 and 21 (either less than 19 or more than 21)? 


0.045 6 (1.000-0.9544), that 1s, 5% 


93. 


94. 
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The total of all probabilities on one side of the mean is 0.5000. A sample mean 
will be above the mean of the sampling distribution half the time. What is the 
probabilitY of a sample mean greater than 21? 


0.0228 (0.5000-0.4772) 


Reference formulas 


My = 


Oxy = 


An airline has determined that the mean weight of a passenger’s baggage 1s 
28.5 kg, with a standard deviation of 5 kg. If we take random samples of 100 
passengers, how often can we expect to find a sample mean over 30 kg? 


meme iw a 
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Population distribution 


uw = 28.5 
a= 50 
Sampling distribution 
ux = 28.5 
Ox = ae 0.5 
/ 100 
Zz score corresponding to 30 kg 
30 — 28.5 
cians (acme 3.0 


Probability of x between 28.5 and 30 
(from table) 0.4987 
Probability of ¥ over 30 
0.0013 (0.5000-0.4987), that is, about once in a thousand times 


95. The airline’s planes hold 100 passengers and, to the delight of the stockholders, 
are always full. Can we consider a plane load of passengers as arandom sample 
of 100 airline passengers? Has every member of the population an equal chance 
of being selected at each stage of the sampling process? 


No, a planeload is not a random sample. For example, the presence of one 
member of a group tour or a family on a plane greatly increases the 
probability of other members of the same group being on the plane. 


96. According to the central limit theorem, the shape of the sampling distribution 


of x for large samples tends to be normal, with uw~ = —— and o, = ____. 
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98. 


The shape of the sampling distribution of x for large samples (n greater than 30) 


is called 


normal 


For large samples (n greater than 30) the sampling distribution of x tends to be 
normal, with mean and standard deviation related to the population parameters 
as follows: 


Me = 
O 
Oy = -— 
Vn 
This is a statement of the _____________S———C—.- theorem. 


mei 


central limit 


On a separate sheet of paper, state the central limit theorem in your own 
words. 


Your answer should have included these points: 

(a) The theorem has to do with the sampling distribution of the means of large 
samples (greater than 30). 

(b) The shape of this sampling distribution tends to be normal. 

(c) The mean of this distribution is equal to the population mean. 

(d) The standard deviation of this distribution equals the population standard 
deviation divided by the square root of the sample size. 
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REVIEW PROBLEMS 


If you have successfully completed this chapter, you can now use information about 
a population to make predictions about samples. You can: 


tell the difference between a population and a sample; 

use a binominal probability table to predict sample proportions, given the 
population proportion; 

use a normal distribution table to predict sample means, given the mean and 
standard deviation of the population. 


Now try these review problems. Table I on pp. 253-256 lists any formulas you 


may need for reference. 


Write the term that corresponds to each of the following descriptions: 


(a) A group of observations chosen at random to represent a larger group of 
possible observations. 

(b) All possible observations of a given type. 

(c) A number that summarizes a population of observations. 


(d) A number that summarizes a sample of observations. 


You have two complete decks of cards with no jokers. You shuffle the cards 
thoroughly and draw five cards. What is the probability that you will draw four 
or more clubs? : 


You take only the face cards and aces froma deck of cards and discard the others. 
Then you shuffle the cards thoroughly and drawtwo cards. Can you use the bino- 
mial probability table to calculate the probability of drawing two clubs? Why? 
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4. The mean diameter of marbles manufactured at a particular toy factory is 


0.850 cm with a standard deviation of 0.010 cm. What is the probability of select- 
ing a random sample of 100 marbles that has a mean diameter greater than 
0.851 cm? 


Answers 


To review a problem, study the frames indicated after the answer. 


1. 


(a) sample 

(b) population | 

(c) parameter 

(d) statistic 

See frames | to 17. 

The population is 104 and the sample size 5, so you may use the binomial 
probability table even though the cards are not replaced. 


P=0.25 (there are equal numbers of the four suits). 
n=5 (you will draw five cards). 
x = 4,5 (you are interested in four or more successes). 


From the binomial probability table you find these values: 0.015, 0.001. The 
probability of drawing four or more clubs is 0.016; it would happen about 1.6% 
of the time. See frames 52 to 71. 

No. Your deck (the population) consists of only 16 cards. Since this is less than 
20 times the sample size, you cannot use the binomial probability table unless you 
replace the card after each individual draw. See frames 63 to 68. 


Ux = 0.85 
o =001 
Fee eG 0 


_ ¥- my _ 0.851 - 0.850 _ 0.001 _ 4 
~ ez 0.001.  OOO1 © 


2 


From the normal probability table the probability of x between 0.850 and 0.851 
is 0.341. The probability of x greater than 0.851 is 0.500 — 0.341 = 0.159, or 
15.9%. See frames 72 to 99. | 


CHAPTER THREE 
Estimating 


Often we need to estimate the characteristics of a population on the basis of 
information about a sample. For example, you may want to estimate the mean 
height of all the trees in the forest without having to measure every tree. Common 
sense suggests that you can get a pretty good idea of the height of all trees in the 
forest by measuring a random sample of trees. In fact, by using sampling-distribu- 
tion tables, you can translate the description “pretty good” into a mathematical 
statement of your level of confidence. This chapter will teach you how to estimate 
population parameters on the basis of data from a sample and how to establish 
mathematically defined confidence intervals for these estimates. 
When you have completed this chapter, you will be able to: 


@ estimate p, P, and o; 
@ establish confidence intervals for P; 
@ establish confidence intervals for y. 


ESTIMATION 


Often we use a sample as a basis for estimating the value of population parameters. 
Fven though we know that the sample statistic is not identical to the population 
parameter, the statistic is our best estimate of the parameter. 


1. If the mean weight of 30 cartons of “large” eggs is 30 oz, what would your best 
estimate be of the mean weight of all cartons of “large” eggs? 


emma i = = 


76 
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3. A marketing research organization reports that 18% of the television audience 
was watching ‘“‘The Return of the Creature from the Black Lagoon” on a given 
evening. Do you think this proportion is a parameter or a Statistic? 


A statistic. This report is based on a sample of television watchers. 


4. In this case p, a (parameter/statistic), is being used as an estimate of P, a 
(parameter/statistic). 


eee ee ae = 


Statistic 
parameter 


5. When we wish to estimate the standard deviation of a population on the basis 
of a sample distribution, we encounter a problem. The number that you obtain 
by applying the formula for co to the sample data is not a good estimate of the 
population standard deviation. Instead, a slightly different formula must be 
used. 


An explanation would be too long to present here, but with small samples the 
formula for o tends to underestimate the standard deviation of the population. That 
is, on the average, the standard deviation of a sample tends to be smaller than the 
standard deviation of the population. 


The two formulas are: 


Standard deviation Estimate of population 
of population standard deviation 


based on sample data 
Pe Y(x — pw)? Ace. S(x-xy | 
n n— | 


The symbol for population standard deviation is __. 
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7. In these formulas, o is a (parameter /statistic); s is a (parameter /statistic). The 


10. 


Mi. 


formula for o contains the expression 7 where the formula for s contains the 
expression _____—s and »_~where the formula for s contains ___ . 


ae aa eee i eee ee ee ee Ke Ce 


o iS a parameter; s is a statistic 
n— | 


~_ 


x 


Which formula contains the expressionn — 1? 


J = 


If you wish to estimate the standard deviation of the weight of all cartons of 
‘large’ eggs on the basis of a sample of 30 cartons, which formula would you 
use (write out the whole formula)? 


cme mm i i i ea a ei ee ee = Oe ee Ce 
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ees ys 
n— | 


12. If 20 cartons of eggs constitute a population and you wish to compute the 
standard deviation of this population, which formula will you use (write out 
the whole formula)? 


13. The following represent the number of words in the speaking vocabulary of 
five children. Describe this group in terms of mean and standard deviation. 
100, 100, 300, 400, 600. 


u = 300 


180000 
5 
Since you are not trying to estimate the parameters of some larger population, 
you should have used the formula for a. 


= 189.7 
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14. 


15. 


16. 


Five children of a given age in an appropriately selected sample have the 
following number of words in their speaking vocabularies. What is the best 
estimate you can make of the mean and standard deviation for all children of 
that age. 


100, 100, 300, 400, 600. 


_ a ~9124 


Since this 1s an estimation problem, you should have used the formula for s. 
(As you will see, with such a small sample these estimates are not very reliable; 
but they are the best estimates you can make.) 


If you compute both s and o for the same data, which will be larger? 


s. The divisor is smaller. 


With a sample size of 1000, would you expect the difference between o ands 
to be substantial? 


No. The difference between a given number divided by 1000 and the same num- 
ber divided by 999 is not very great. 
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17. 


18. 


19. 


20. 


With a sample size of 4, would you expect the difference between o and s to 
be substantial? 


Yes. There is a substantial difference between x/3 and x/4. 


What statistic is used as an estimate of P? 


What statistic is used as an estimate of co? 


What statistic is used as an estimate of uu? 


mii i = 


82 
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CONFIDENCE INTERVALS FOR u 


The amount of confidence you can put in an estimate of a parameter varies. In some 
cases you can say that it is virtually inconceivable for your estimate to be wrong by 
an amount that has any practical significance. In other cases you must consider your 
estimate, at best, a “ball-park” figure, an indicator only of the approximate 
magnitude of the parameter of interest. One useful way to deal with this problem is 
to establish a confidence interval for your estimate; for example, you might say “the 
mean number of characters per message in our data communications system is 
between 307 and 313 with 98% confidence.” 


21. 


Lz: 


To clarify how you can establish a confidence interval, and what it means, let 
us consider a somewhat artificial example. Suppose you are trying to estimate 
the mean of a population in a situation in which you know the population 
vaniability. 

A factory has two machines that cut spaghetti and noodles into appropriate 
lengths for packaging. Both machines can be adjusted to cut any given length. 
The variability of the two machines differs, however. The standard deviation of 
pasta cut on machine A is 0.1 cm. The standard deviation of pasta cut on 
machine B 1s 1.0 cm. 


If the factory manager wants all noodles in a batch to be as nearly as possible 
the same length, which machine will he use to cut them? 


Machine A, because its output is less variable. 


Now consider what would happen if you took a number of samples of pasta 
cut by machine A and computed the mean of each sample. Suppose for example 
that the machine is set to cut pasta 30 cm in length and you take a number of 
36-noodle samples. We can predict the sampling distribution for machine A on 
the basis of the central limit theorem. Complete the following table: 


Machine A 
Population uw = 30cm 
o0 = Q.1 cm 


Samples of 36 uz; = wu 
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23. 


24. 


25. 


26. 


Machine A 

hy = 30 cm 

paces: os = 0.0167 cm 
36 


of uw? 


0.6826. Roughly 68% of the time. Since a; = 0.0167, z = 1.0. The entry in the 
normal distribution table for z = 1.0 is 0.3413. You are interested in values 
either above or below u, so you must double the value in the table. 


‘Under the circumstances described (o = 0.1,” = 36)X willbe within0.0167 cm 
of 168° of the time.’ True or false? 


True 


Now, suppose that » is not known. If o and n remain unchanged, this 
statement will be true, no matter what the value of yw. If u is unknown, there is 
a 68% chance that we will be correct in predicting that it is within 0.0167 cm of 
whatever x we compute from a sample; for example, if we obtain an x of 20 
from our sample, there is a 68% chance that p is between 20 + 0.0167 and 
20 — 0.0167. 


We can say that p is between 19.9833 and with 68% con- 
fidence. 


20.0167 


If you are going to draw practical conclusions from your estimates, you will 
probably want a higher confidence level than 68%; for example, you might want 
to take only a 5% risk of making an incorrect estimate. Then you would choose 
a confidence level of ____ 


84 
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Z/. 


28. 


Assume that you have taken a sample of 36 noodles cut on machine A and have 
found that the mean length of the 36 noodles is 25 cm. You now want to deter- 
mine the exact setting of the machine with 95% confidence. Here is how to es- 
tablish a 95% confidence interval: 


(a) 
(b) 


(c) 


(d) 


(€) 


Decide what confidence level you will use. We chose 95%. 

Divide the confidence level by two to find what percentage of the sampling 
distribution must be included on each side of the mean. To include 95% 
you must include 47.5% on each side of the mean. 

Look in the normal distribution table to find the z score that includes the 
appropriate percentage. We find that 0.475 corresponds to a z score of 
1.96. 

The z score tells you how many standard deviations on either side of the 
mean you must go to include the desired percentage. You must multiply it 
by the standard deviation of the sampling distribution to obtain a value in 
centimeters. 


1.96 x 0.0167 = .033 cm 


Add the value in centimeters to the sample mean to obtain the upper limit 

of the confidence interval and subtract it from the sample mean to obtain 

the lower limit. Since the sample mean was 25.000, the upper limit is 
. The lower limit is 


eae es eee CC ee Ce Cee L 


Now assume that you want to compute a 99% confidence interval on the basis 
of the same sample: 


x 


= 25.000 cm 
= 0.1 cm 


Compute a 99% confidence level for u. Refer to frame 27, if you wish, as you 
complete these steps. 


(a) 
(b) 
(c) 
(d) 
(e) 


What is the confidence level? 

What percent on each side of the mean? 
What z score? 

How many centimeters? 

What are the upper and lower limits? 


cee eee ee eee eee le lie. 8 


29. 


ahs 
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(a) 99% 
(b) 49.5% 
(c) 2.58 


(d) 0.043 cm (2.58 x 0.0167) 
(e) 24.957-25.043 


Z,) 1s the symbol used to indicate the z score that corresponds to your choice of 
a confidence level; for example, with a 95% confidence level z, = 1.96. With a 
99% confidence level z, = 


The formula for a confidence interval for p is X + Al =) 


vn 


The symbol + is read “plus or minus.” In this formula it tells you to add to 
find the upper limit and subtract to find the lower limit. Using this formula as 
a reminder, compute the 99% confidence interval for the setting of machine B 
on the basis of the following information: 


xX = 25 

a= 1.0 

n= 36 

uw is between _________ and _______ with 99% confidence. 


24.57 and 25.43 


What z, corresponds to a confidence level of 90%? 


eee 
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CONFIDENCE INTERVALS FOR uw WHEN o IS UNKNOWN 


When you do not know the population standard deviation, you can still calculate a 
confidence interval for » on the basis of sample statistics. To do this, you must know 
both the mean and the standard deviation of the sample. 


32. 


33. 


In the cases we have discussed so far you are estimating uw, but you know the 
exact value of the population parameter o. Do you think this would be a com- 
mon state of affairs? 


meee eee 


No. You are most likely to encounter situations in which both uw and o are 
unknown. 


If the sample size is 30 or more, you can obtain a close estimate of the true 
confidence interval by substituting an estimate for o. What is the estimate for 


go? 


eee ei i i a Ce =e 
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Reference formulas: 


Confidence interval for u: x + z 


_ vyy\2 
Estimate of a: s = eae 


You are studying the time it takes for an air-traffic controller to read a complex 
radar display and react to it under certain standardized conditions. By conduct- 
ing one trial with each of 36 randomly selected controllers you obtain the follow- 
ing statistics. 


xX = 3.15 seconds, Z(x — xy = 20.00, n = 36. 


S 


What is the 95% confidence interval for mean time to read and react to the 
display? 


(a) 95% confidence interval 
(b) 47.5% on each side of the mean 
(C)° =25-=-1296 


_ w wy 
(d) s= y= = Vis = 0.755, 2,-~= 1.96 (2755) - 0.25 
n— | 35 Jn 


(e) The confidence interval is 3.15 + 0.25; 
uz is between 2.90 and 3.40 with 95% confidence. 
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35. 


36. 


37. 


The normal probability table gives the sampling distribution of x for samples of 
30 or more. Can you use it to establish a confidence interval for u on the basis 
of a sample of 10? 


Oe 


To use the normal probability table for establishing a confidence interval for 
wu you must have a sample size of at least 


mmm eee ei ee Ce 


If your sample is smaller than 30, it 1s still possible to establish a confidence 
interval for n, provided you can assume the population distribution is normal; 
for example, suppose in your study of air-traffic controllers you are able to 
observe only samples of 20. The graphs below summarize the data you obtain 
for three different tasks. 


ly La Lhe 


Time Time Time 


Task A Task B Task C 


You are prepared from experience to assume that the population distributions 
are normal unless the data make this assumption unlikely. Would these data 
make you assume that the population distribution for any of these tasks is not 
normal? If so, which one(s)? 
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38. 


39. 


40. 


Task C. The sample distribution is so skewed that it seems unlikely to have 
come from a normally distributed population. (Although the sample distribu- 
tion for Task A is also slightly skewed, you are quite likely to draw aslightly 
skewed sample of 20 from a normally distributed population.) 


For samples larger than 30 must you assume that the population is normally 
distributed in order to establish a confidence interval for u“? 


For samples smaller than 30 must you assume that the population is normally 
distributed in order to establish a confidence interval for u? 


ei ee ee Ce Cee ee ee eae aa ee 


Can you establish a confidence interval for u on the basis of a sample of 15 from 
a population that is not normally distributed? 
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41. 


When you establish a confidence interval for u for asmall, normally distributed 
sample, you must use a sampling distribution called ““Student’s ¢.”’ This distribu- 
tion is similar to the normal distribution, but its exact shape depends on the 
size of the sample. The procedure is the same except that 


(a) instead of using a value of z, from the normal probability table, you must 
use a value of ¢, from a ¢ table, 
(b) you must always use s, even if o is known; 


for example, let us establish a 99% confidence interval for the mean time of 
task B in frame 37. 


x = 3.00 s = 0.53 n= 20 


From the table we find that ¢, for 99% of a sample of 20 is 2.86. (Patience, you 
will learn how to read the table in a moment.) 


The formula for the confidence interval of a small, normally distnbuted sample 
1S 


S 


li) rea 
Vn 


Xe 


What is the confidence interval in this case? 


¥ + ty (s//n) = 3.00 + 2.86 (0.53/\/20) = 3.00 + 0.34; w is between 2.66 and 
3.34 with 99% confidence. | 
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CRITICAL POINTS OF THE ¢ DISTRIBUTION 


The first column lists the number of degrees of 
freedom (v). The headings of the other columns 
give probabilities (P) for t to exceed the entry value. 
Use symmetry for negative t values. 


Malet lala 


] 3.078 6.314 12.706 31.821 63.657 
2 1.886 2.920 4.303 6.965 9.925 
3 1.638 2.353 3.182 4.541 5.841 
4 1.533 2.132 2.776 3.747 4.604 
5 1.476 2.015 2.571 3.365 4.032 
6 1.440 1.943 2.447 3.143 3.707 
7 1.415 1.895 2.365 2.998 3.499 
8 1.397 1.860 2.306 2.896 3.355 
9 1.383 1.833 2.262 2.821 3.250 
10 1.372 1.812 2.228 2.764 3.169 


42. The above is an excerpt from a ¢ table. Each line represents the distribution of 
t for a particular sample size. The df (degree of freedom) column on the left 
allows you to select the correct distribution for your sample size. The degrees 
of freedom in a single sample are equal to n—1. Thus the degrees of freedom 
for a sample of eight are 


me ww www i = = = 


43. The degrees of freedom for a sample of 10 are 


cee i 
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44. 


45. 


46. 
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The headings at the top of the table give the probabilities that the mean is 
outside the upper or lower limit set by the indicated values of r; for example, if 
you are setting a 95% confidence interval, you will want the probability of a 
mean outside the upper limit to be only 2.5%. (Of course, the probability ofa 
mean outside the lower limit will also be 2.5%.) You will therefore choose a 
value of ¢ from the column headed 0.025. What column will you use if you 
are setting a 90% confidence interval? 


The column headed 0.05 


You want to look up z, for a 98% confidence interval for a sample of six. What 
value of df will you use? 


five (df = n — 1) 


What column will you use? 
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47. Circle the correct entry in the table at frame 42. 


CRITICAL POINTS OF THE ¢ DISTRIBUTION 


The first column lists the number of degrees of 
freedom (v). The headings of the other columns 
give probabilities (P) for t to exceed the entry value. 
Use symmetry for negative t values. 


sll ae 


l 3.078 6.314 12.706 31.821 63.657 
2 1.886 2.920 4.303 6.965 9.925 
3 1.638 2.353 3.182 4.541 5.841 
4 1.533 2.132 2.776 3.747 4.604 
5 1.476 2.015 2.571 4,032 
6 1.440 1.943 2.447 3.143 3.707 
7 1.415 1.895 2.365 2.998 3.499 
8 1.397 1.860 2.306 2.896 3.355 
9 1.383 1.833 2.262 2.821 3.250 

10 1.372 1.812 2.228 2.764 3.169 


48. You want to set a 99% confidence interval for u on the basis of a sample of 
15. Look up the appropriate ¢, in Table V on page 270. 


ly = 


em wwee i w=  e = 


2.977 


You will notice tha: the t table has entries for df of more than 30. If the population 
is normally distributed but o is unknown, the t table can be used to obtaina precise 
confidence interval for u, more accurate than the approximation obtained when 
you use s as an estimate of o with the normal distribution table. The larger the 
sample, the closer t, comes to 2,, So that for large samples there is little practical 
difference between the two procedures. 
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49. For each of the following samples how would you go about establishing a 98% 
confidence interval for 4? Choose the correct answer for each sample. 


Sample A 


The population from which this sample was drawn has a known standard devia- 
tion: 0 = 7.85. The population distribution is known not to be normal. 


(a) Usez 
(b) Usez 
(c) Can't establish a confidence interval 


Sample B 


The population 1s assumed to have a normal distribution. 


(a) Usez 
(b) Uset 
(c) Can’t establish a confidence interval 
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SE SD 


Sample C 


The standard deviation of the population is unknown. The distribution is 
known not to be normal. 


(a) Usez 
(b) Uset 
(c) Can't establish a confidence interval 


Sample D 


The population is known to be normally distributed, with a standard deviation: 
a = 8.0. 


(a) Usez 
(b) Uset 
(c) Can’t establish a confidence interval 
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Sample E 


The population is known to be normally distributed. 


(a) Use z 
(b) Uset 
(c) Can't establish a confidence interval 


n= 24 


Sample F 


On the basis of experience you assume the population is not normally distrib- 


uted. 
(a) Usez 
(b) Uset 


(c) Can’t establish a confidence interval 
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Sample A (a) Use z The sample ts large 
Sample B_- (b) Use t The sample is small but from a normal 
population | 
Sample C (a) Use z The sample is large 
Sample D (b) Use f The sample is small but from a normal 
population 
(a) Use z The sample is large and from a normal 
Sample E <or population* 
(b) Use t 
Sample F (c) Can't establish The sample is small and the 
a confidence population is not normally 
interval distributed 


*The results will be almost identical, whichever method you use. Theoretically ¢ is more accurate, 
since you do not know the exact value of a. 


LARGE SAMPLE CONFIDENCE INTERVALS FOR P 


Just as you can find confidence intervals for » by using the sampling distribution of 
x, you can also find confidence intervals for P by using the sampling distributions 
for p; for example, if half a random sample of 30 visitors at a given resort are men 
(p = 0.50), you can state with 99% confidence that between 0.27 and 0.73 of all 
visitors are male. For small samples special tables based on the binomial probability 
distributions are available. For large samples the binominal probability distribution 
becomes approximately normal in shape. Therefore the normal probability table can 
be used to set a confidence interval for P if the sample is large. 


50. 


5}. 


Suppose you are trying to estimate the effectiveness of a vaccine. You adminis- 
ter the vaccine to 200 people and then test to determine whether they have 
acquired an immunity to the disease in question. You find that 185 of 200 have 
acquired an immunity; that is, p = 


0.925 (185/200) 


You would like to establish a 99% confidence interval for P. First you must 
make sure the sample is large enough. To establish a confidence interval for P 
the smaller group in your sample must have at least 10 cases. The smaller group 
in this case consists of the 15 cases who did not acquire an immunity. Is the 
sample large enough? 


mie ee i ae eC CC 
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33. 


The formula for establishing a large-sample confidence interval for P is 


vP4 


fl 


pt Zz 


In this formula p, as you know, is the probability of whatever outcome you 


are concerned with. The probability of any other outcome is g; for example, if 
p is the probability of acquiring immunity, g is the probability of not acquiring 
immunity: p + q = |. If p = 0.925, q = 


Use the formula to establish a confidence interval for p. The z, you use is the 
same z, you would use to establish a large-sample confidence interval for uw; 
P is between ____—— and —___-dr with 99% confidence. 


mm ei i wl 


0.973 and 0.877 


Jog 


nA 


0.925 + 2.58 Casuals 0.925 + 0.048 


\/ 200 


pt Zz 
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54. In asample of 100 cases 20 people who receive a fund-raising letter contribute 
and 80 do not. Predict the proportion P you would expect to contribute if the 
same letter were sent to the population of 10,000 from which the sample was 
selected. Use a 95% confidence interval. 


mm i i es = 


p + 2, ¥P2 = 0.20 + 1.96 VL0-20) (0-80) _ 9 59 + 0.08 
n V 100 


P is between 0.12 and 0.28 with 95% confidence. 


REVIEW PROBLEMS 


If you have successfully completed this chapter, you can now use sample statistics to 
estimate population parameters. You can: 


@ estimate p, P, and o; | 
@ establish a confidence interval for wu, using a sample mean and standard 
deviation; 
@ establish a confidence interval for P, using data from a large sample. 
Now try these review problems. Table I on pp. 253~256 lists any formulas you 
may need for reference. 


1. The following data are a sample selected from a population you assume to be 
normally distributed. What is your best estimate of « and o? Establish a 95% 
confidence interval for u. 


Data: 3, 4, 5, 6, 6, 7, 7, 7, 8, 8, 8,9, 9, 10, 11, 12 
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2. Inan appropriately selected sample of 144 members of a fraternal organization, 
20% are college graduates. Estimate the percentage you would find if you sur- 
veyed all 200,000 members. Establish a 99% confidence interval. 


3. You select a random sample of 100 chocolate-covered peanuts from the out- 
put of a candy factory. Measuring the thickness of the chocolate coating yields 
the following statistics for this sample: 


x = 0.1 cm s = 0.01 cm 


Establish a 95% confidence interval for the parameter w. 


4. State briefly the assumptions involved in establishing a confidence interval using 
z and ¢. 
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Answers 
To review a problem, study the frames indicated after the answer. 


1. The best estimate of u is Xx. 


The best estimate of o is s. 


2 
p= (RO = JR = 587 = 2.42 


For a confidence interval you must use ¢ because the sample is small. 
f= 213 (df = 15) 
The formula for the confidence interval is 
Fe ene ee ee eae 
vn V16 
= 7.5 + 2.13 (0.605) = 7.5 + 1.29 


u is between 6.21 and 8.79 with 95% confidence. See frames 37 to 49. 
2. This is a large-sample confidence interval for P. 
The formula is 


p+ 2, P49 — 0.20 + 2.58 V0.0) 0.80) 


vn 144 
= 0.20 + 2.5854 = 0.20 + 0.085 
P is between 0.115 and 0.285 with 99% confidence. See frames 50 to 54. 
3. Because of the large sample size, the appropriate formula is 


5S _ 944 1.96 99! — 0.1 + 0.00196 


Vn 100 
The mean thickness of the chocolate in the factory’s output is between 0.098 
and 0.102 cm with 95% confidence. See frames 2] to 36. 

4. Using z to establish a confidence interval assumes that the sample is more than 
30. Either o is known or s is used as an estimate. Using ¢ to establish a con- 
fidence interval assumes that the population is normally distributed; o is never 
used, always s. See frames 32 to 36, 37 to 42, and 49. 


XZ 


CHAPTER FOUR 


Hypothesis Testing 


In scientific research, we develop theories and then conduct experiments to test 
them. Since we cannot achieve perfect control in any experiment, some chance 
variation always results. For example, even the most precise measurement system 
has some limit to its accuracy, and the laboratory mice bred for biological research 
to be as similar as possible are still not completely identical. Also, there is a limit to 
the number of times observations can be repeated, so our measurements are always 
only a sample of reality. 

To generalize from our experimental sample, we use statistical techniques to test 
how big a part chance plays in the outcome of an experiment. We begin by 
assuming that our experimental results reflect only the random variations caused 
by assorted factors beyond our control. This assumption 1s called the nu// hypothe- 
sis. If our experiment is successful and our theory is true, we will be able to reject 
the null hypothesis by showing that chance variation is not a reasonable explana- 
tion for our results. 

When you have completed this chapter, you will be able to: 


@ state a null hypothesis and alternative and establish a critical region for 
theories about P; 

@ perform a similar statistical test for theories about p; 

@ judge the probability of accepting a theory when it 1s false or rejecting a theory 
when it 1s true. 


HYPOTHESIS TESTING—PROPORTIONS 


To understand how statistical tests are used in the testing of scientific theories, we 
will begin by looking at how to use the binomial probability table to test theories 
about proportions. 


1. There is a formal procedure for the statistical testing of theories in scientific 
research: 


(a) Plan an experiment so that if the results cannot be explained by the chance 
variation involved in drawing a sample your theory will be confirmed. 

(b) Conduct the experiment and collect sample data. 

(c) Assume that the results are due to chance alone. This assumption 1s called 
the null hypothesis. 
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(d) Use a theoretical sampling distribution based on the null hypothesis to 
determine the probability of obtaining sample data like yours by chance 
alone. 

(e) Ifthe probability of obtaining sample data like yours by chance alone is 
less than some predetermined small percentage (usually 5°% or 1%) the 
results will be significant. You may reject the null hypothesis and consider 
your theory confirmed. 


In this procedure the null hypothesis assumes that the results of the experiment 
are due to 


aa ae 


chance, the chance variation involved in drawing a sample 


If you reject the null hypothesis, your theory (is/is not) confirmed. 


Study these definitions: 


Null hypothesis assumption that experimental results are due to chance 
alone 

Alternative your theory (will be confirmed if you reject the null 
hypothesis) 

Significant results experimental results that are not likely to have occurred 


by chance alone 


The assumption that experimental results are due to chance 1s called the 


null hypothesis 


Your theory ts called the 


alternative 


If the results are significant, you will reject the 


null hypothesis 
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6. Nonsignificant results (will/will not) allow you to reject the null hypothesis. 


miei ei se 


will not 


7. Results that are unlikely to have occurred by chance are called 


significant 


8. Let us look at an example of this procedure. A researcher 1s studying the be- 
havior of fruit flies. He wants to investigate the theory that “you can catch more 
flies with honey than with vinegar.” He establishes a standardized procedure for 
using both substances as bait and he plans to count the total number of flies 
caught, n, and the proportion caught with honey, p. His null hypothesis ts that 
there is no real difference between the number of flies that can be caught with 
honey or vinegar. The differences are due only to chance variations from sample 
to sample. If this is true, the population parameter P = 


0.5. Flies show no real preference in how they get caught. They might as well just 
flip a coin. 
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9. The researcher decides to continue the experiment until he catches 15 flies. He 
can now find a theoretical sampling distribution based on the null hypothesis by 
looking at a binomial probability table. For P = 0.5 andn = 15 the distribution 
is as follows: 

14 005 0.1 0.2 0.25 0.3 0.4 0.5 0.6 0.7 0.75 08 0.9 0.95 

1S 0 0,463 0.206 0.035 0.013 0.005 
| 0.366 0.343 0.132 0.067 0.031 0.005 

2 0.135 0.267 0.231 0.156 0.092 0.022 

3 0.03) 0.129 0.250 0.225 0.170 0.063 0.002 

4 0.005 0.043 0.188 0.225 0.219 0.127 0.007 0.001 

S$ 0.001 0.010 0.103 0.165 0.206 0.186 0.024 0.003 0.001 

6 0.002 0.043 0.092 0.147 0.207 0.061 0.012 0.00! 0.001 

7 0.014 0.039 0.081 0.177 0.118 0.035 0.013 0.003 

8 0.003 0.013 0.035 0.118 0.177 0.081 0.039 0.014 

9 0.001 0.003 0.012 0.061 0.207 0.147 0.092 0.043 0.002 

10 0.00! 0.003 0.024 0.186 0.206 0.165 0.103 0.010 0.001 

lI 0.001 0.007 0.127 0.219 0.225 0.188 0.043 0.005 

12 0.002 0.063 0.170 0.225 0.250 0.129 0.031 

13 0.022 0.092 0.156 0.231 0.267 0.135 

14 0.005 0.031 0.067 0.132 0.343 0.366 

15 0.005 0.013 0.035 0.206 0.463 
How often would he expect to catch 10 or more flies with honey by chance 
alone? 
0.151 or 15.1% of the time. You must add the frequencies for 10,11,12,..., 15. 
Note: the blank areas in the table represent probabilities so small that they 
round off to zero. 

10. The researcher decides that he will reyect the null hypothesis only if his results 


would occur by chance less than 2% of the time. How many flies will he have 
to catch with honey to reject the null hypothesis? Use the table in frame 9. 


mm Ose i i ee ia ii ii el eC 


At least 12. The frequencies for 12, 13, 14, 15 add up to 0.017 or 1.7%. 
Catching 1|1 or more flies would occur by chance 0.059 or 5.9% of the time, so 
12 is the smallest number that will meet the researcher’s criterion. 
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11. 


12. 


13. 


14. 


The researcher catches 13 flies with honey and 2 with vinegar. Are his results 
significant? 


The set of results that are significant is called the critical region. In this case 
the critical region is p = +3 (p is equal to or greater than }4). Is +} in the critical 
region? — 


ei i ee ee 


Is +? in the critical region? 


For this experiment, p > +? 1s called the 


” 


ee ae ee ee 


critical region 
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Probabilities 


2% significance level- 
probability of results 
in critical region by 
chance alone 


0 1 2 3 4 5 6 7 8 9 10 1% 12 #13 +414 ~ «15 


Possible results 
Critical 
region 


The significance level is the probability that significant results will occur by 


chance. For this experiment the researcher is using a significance level of __",. 


significance level 
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17. 


18. 


19. 


20. 


21; 


We can summarize the statistical test of the researcher’s hypothesis as follows: 


Null hypothesis P=0.5 


Alternative P > 0.5 (P 1s greater than 0.5) 
Significance level Pa 
Critical region p 2 7% (pis equal to or greater than 2 


If, in fact, you can catch more flies with vinegar than with honey, will the 
researcher find it out from this statistical test? 


meee ae eee SOL 


No. Of course, he could have started out with the theory that you can catch more 
flies with vinegar, have considered a fly caught with vinegar a success, and used 
a similar test. 


Let us restate the researcher’s hypothesis slightly and see how this affects the 
Statistical test. “You will catch a different number of flies with honey than with 
vinegar.” If you can catch flies only with vinegar (P = 0), will this theory be 
confirmed? 


~~ oo eee en a ei el a eee eee ee Cee C"N. Ly’ 


mm ce ee ee ee ee eee 2 ee, eee) ee ee 


Which of the following is the appropriate alternative for a statistical test of 
this theory? 


(a) P>O.5 (P is greater than 0.5) 
(b) P< 0.5 (P is less than 0.5) 
(c) P #05 (P is not equal to 0.5) 
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(c) P#0.5 


22. Which of the illustrations below shows an appropriate critical region for this 
Statistical test? 


012 34 5 6 7 8 9 10 1112 13 14 15 


Noy 
Critical 
region (a) 


0123 4 5 6 7 8 9 10 11 12 13 14 15 


Nat ee 4 
Ss Critical ee ae 
region 
(b) 


(b). In this case either a very high p or a very low p would be consistent with 
the alternative and would cause you to reject the null hypothesis. 


110 HYPOTHESIS TESTING 


23. 


24. 


25. 


26. 


To establish the critical region for this statistical test you must consider both 
ends of the sampling distribution; for example, how often would you expect 
p © # (p equal to or less than 4) or p = 7} (p equal to or greater than 12)? 


a a ie eee ee Ce Ce 


0.034 or 3.4% of the time. You must add the frequencies for 0, ..., 3 and 
Peers bsp 


If the researcher continues to use a significance level of 2%, what will his critical 
region be? 


Summarize the statistical test for the theory that ‘you will catch a different 
number of flies with honey than with vinegar.” 


Null hypothesis 

Alternative 

Significance level > eee 
Critical region | 


Null hypothesis PS 05 
Alternative P +05 
Critical region PstorP2>38 


A caution before we goon. Thestatistical test of a hypothesis takes into account 
the chance variations involved in drawing a sample. It does not take into 
account unplanned influences in the way we conduct the experiment. Ina poorly 
planned experiment we may obtain statistically significant results that are 
meaningless; for example, suppose that there is a prevailing wind in the direc- 
tion of the honey at the time we conduct our fruit fly experiments. We catch 
flles with honey—a statistically significant result. Does this result support the 
theory that you can catch more flies with honey than with vinegar? Why? 


No. You cannot tell whether it was the honey or the direction of the wind that 
determined the outcome of the experiment. 
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27. 


28. 


29. 


30. 


You want to test the theory that a different number of women than men attend 
sports events. In an appropriately selected sample of 15 attendees you count 
the proportion of women, p. What is an appropriate null hypothesis for a 
statistical test of your theory? 


P # 0.5. You have no firm prediction that women or men are more likely to 
attend. 


Use the table in frame 9 to determine a critical region for a significance level of 
5’%. What is the critical region? 


mii i = = 


You want to test the theory that more men than women attend sports events. 
In an appropriately selected sample of 15 attendees you count the proportion of 
women, p. Summarize an appropriate statistical test for your theory, using a 
5% significance level. 


Null hypothesis 
Alternative 
Critical region 


Null hypothesis P 
Alternative P 
Critical region p 
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31. 


You want to test the theory that more men than women attend sports events. 
Your research assistant, an attractive young woman, records data on the first 
25 people she meets at the stadium. The results are statistically significant. 
Comment on the meaning of these results. 


mei eee ie em ee Ce ee =. 


The experiment was poorly planned. All attendees cannot be assured an equal 
probability of being selected for the sample. Asa result, the data are meaningless 
as a test of your theory. You might use an experiment of this nature as a test of 
the theory that an attractive young woman will meet more men than women at 
sports events. 


HYPOTHESIS TESTING—MEANS 


So far you have used the hypothesis-testing procedure for theories about the 
parameter P, but you.can use the same kind of procedure for theories about p. 


32. 


Our fruit fly researcher believes feeding fruit flies a mixture of honey and 
vinegar will alter their life-span. He knows from long experience that the 
particular strain of fruit fly he is working with has a mean life-span of 12 days, 
with a standard deviation of two, when fed their normal diet of apple juice. He 
plans to raise a sample of 50 fruit flies on honey and vinegar and calculate 
their mean life-span. What is an appropriate null hypothesis for a statistical 
test of this theory? 


ea a ee i ee = ee Le 
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33. 


34. 


35. 


36. 


37. 


What is the alternative? 


mm es i wi w= = == 


The central limit theorem allows him to develop a sampling distribution based 


on his null hypothesis. The mean of the sampling distribution will be ___. 


ew ee eee wee = CL ee 


= 0.28. 


wwii i = 


The procedure for establishing a critical region in this case is the same as 
that for establishing a confidence interval for uw. If our fruit fly researcher uses 
a 1°% significance level (equivalent to a99% confidence interval), what will be the 
critical region? z > _____— rz S 


ee ee ee 


z2 +258 or z< —2.58 


X— ey ¥- 
ze = —— = or 4 


Ox (a/\/n) | 
If the researcher finds that the mean life-span of his sample of 50 fruit flies is 
12.5 days, can he reject the null hypothesis? 


meme i Se 


No (z = +1.77) 
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38. Summarize the statistical test: 
Null hypothesis 
Alternative 


Critical region 


mmm eee ei 


12 
12 
+2.58 or zs —2.58. 


Null hypothesis m 
Alternative u 
Critical region Z 


IV th Il 


39. A company has administered a brief verbal intelligence test to all employees 
for a number of years. For all 6000 present employees the mean score is 50, with 
a standard deviation of 10. A personnel researcher wishes to investigate the 
theory that first-line supervisors score higher on this test than the average em- 
ployee. To test this theory she selects a random sample of 100 first-line 
supervisors and computes their mean score on the test. In this problem you 
are concerned only with one end of the sampling distribution. Summarize an 
appropriate statistical test to confirm this theory at the 1% level of signifi- 
cance. 


Null hypothesis 
Alternative 
Critical region 


mee ei eee i i a i ee i Ce Ce 


Null hypothesis uw = SO 
Alternative u> SO 
Critical region z= +2.58 


40. If the mean score of the sample is 53, are the results significant? Refer to the 
reference formulas in Table I if you wish. 


cee ae i eC 


¥-u; F¥-u_ 53-50 


~~ = = 30 
O; (a/\/n) (10/100) 
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THE PROBABILITY OF ERROR 


When you test theories using statistical procedures, there is always a certain 
possibility that an unusual randomly selected sample will lead you astray. You 
cannot eliminate the possibility of error, but you can calculate how great the chance 
of error is and, if necessary, change your experiment to keep it within acceptable 
limits. 


41. A statistical test of a theory never gives absolute certainty. Even when an ex- 
periment is planned perfectly, there are two possible types of error: 


Type I The theory is not true (the null hypothesis 1s true) but the results are 
significant by chance. 

Type II The theory is true (the null hypothesis is false) but the results are not 
significant. 


As an example, if you use a 5% significance level, you will obtain significant re- 


sults by chance 5% of the time. This is an example of a Type error. 


Type I error 


42. Type I error occurs when the theory is (true/false) and the results are (signifi- 
cant/not significant). 


ee ee 


false 
significant 


43. Ifthe theory is true but the results are not significant, it is called a Type 
error. 


mw a i = a Ce 


44. A Type II error occurs when the theory is (true/false) and the results are 
(significant/not significant). 


ei i i ee ee ee ee ee 


true 
not significant 
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45. 


46. 


47. 


48. 


Label the situations that correspond to Type I and Type II errors: 


(a) Theory false, results significant 
(b) Theory true, results significant 
(c) Theory false, results not significant 
(d) Theory true, results not significant 


mai i ee - - C—ee Ce 


(a) Theory false, results significant; Type I error 

(b) Theory true, results significant 

(c) Theory false, results not significant 

(d) Theory true, results not significant; Type II error 


In 100 similar studies of the difference between two teaching methods 75 
studies result in significant differences between the two methods and 25 find 
no significant difference. If you believe that all the studies were well designed, 
you will probably consider the 25 studies in which no significant difference was 


found to be examples of Type errors. 


Seventy-five studies are conducted to determine the effect of a particular drug 
on the appetite of white rats. The sample sizes vary and several different 
measures Of appetite are used. Four of the studies have results significant at 
the 5°% level. What type of error do you think may be involved here? 


emma i ee 


Type I. Ina large number of studies you would expect a certain number of 
significant results by chance alone. 


The significance level you use determines the probability of a Type —_ error. 


EN 
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49. 


30. 


SI. 


52. 


53. 


With a significance level of 1%, the probability of a Type I error is 


0.01 (1% of the time) 


The Greek letter alpha (a) is used as the symbol for the probability of a Type 
I error. For a statistical test using a significance level of 5% we say a = 0.05. 
For a Statistical test using a significance level of 2% we say 


The Greek letter beta (8) is used as the symbol for the probability of a Type II 
error. Thus the symbol for the probability of Type I erroris___ and the symbol 
for the probability of Type II error is __. 


(64 
Bb 
B is the probability of a Type error. 
a is the probability of a Type error. 


mii a eae 


If a researcher’s theory is true and £ is very large, the chances of proving the 
theory with a statistical test are (excellent /poor). 


mmm ei i = iw es 


poor. He is likely to make a Type II error. 
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54. 


55. 


37. 


Instead of referring to $, researchers sometimes use the term ‘“‘power.”’ Power 
is (1 — /3). A statistical test with a low # has a (high/low) power. 


meee ee ie ee ee C8 —— 


A statistical test for which there is a high probability of a Type II error has 
(high/low) power. 


Si es ee eum rg re Se re cr ee) ee eS ee, 


Would you say that power in a statistical test is a good thing? 


i i a ee ee ee Ce 


yes. You have a better chance of confirming your theories if they are correct. 


If you make an assumption about the true state of affairs, you can determine 
the power of a Statistical test; for example, let us consider the test at the start 
of this chapter for the theory that ‘‘You can catch more flies with honey than 
with vinegar.” As you recall, the experiment consisted of catching 15 fliesand 
counting the number caught with honey as opposed to vinegar. A significance 
level of 2°% was chosen (a = 0.02). 


Null hypothesis P=0.5 
Alternative P>0.5 
Critical region P> 


— 14 


To reject the null hypothesis we must catch at least 
honey. 


of the 15 flies with 


me eee eee ee ei ee Cee 
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58. Let us assume that if we could test all flies 80°, would prefer honey (P = 0.8). 
We can now use the binomial probability table to give us a sampling distribution 
for samples of 15 from this population. The distribution is as shown below. 


0.05 0.1 0.2 025 0.3 04 0.5 0.6 0.7 0.75 08 0.9 0.95 


0.463 0.206 0.035 0.013 0.005 

0.366 0.343 0.132 0.067 0.031 0.005 

0.135 0.267 0.231 0.156 0.092 0.022 0.003 

0.031 0.129 0.250 0.225 0.170 0.063 0.014 0.002 

0.005 0.043 0.188 0.225 0.219 0.127 0.042 0.007 0.001 

0.00! 0.010 0.103 0.165 0.206 0.186 0.092 0.024 0.003 0.001 


ARWN—-OwmrIAVUVA WN —O 


What is the probability of catching 12 or more flies with honey from this 
population? 


SS mee ee i i ae 


0.648, about 65% of the time 


59. We can say that the power of our test against the alternative P = 0.8 is0.648. 
The chance of rejecting the null hypothesis in this case is 0.648. Therefore the 
chance of accepting it incorrectly is 


me ei i 


| — 0.648 = 0.352 
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60. 


6}. 


62. 


63. 


If P = 0.8, how often will this statistical test result in a Type II error? 6 = 


ee eae ae meee ee Ce 


B = 0.352 


What is the power of this test against the alternative P = 0.9? You will use the 
same reasoning but now you must use the sampling distribution for P = 0.9. 


The power of the test against the alternative P = 0.9 is 0.945. 


You can make similar computations for statistical tests about the mean and for 
other statistical tests covered in this book. The details of the computations are not 
presented, but the logic of the process remains the same as in the simple case we 
have considered. You must make enough assumptions about the alternative to allow 
you to construct a theoretical sampling distribution for the alternative; for example, 
for a test about the mean you must assume values of u and oa. Then compare this 
new sampling distribution with the critical region established for your statistical 
test to determine how often you would obtain significant results under the new 
assumption. 


Three factors that influence the probability of a Type II error are significance 
level, size of the sample, and variability of the population. A large @ results in 
a relatively smaller £; for example, an @ of 0.05 will cause (more/less) risk of 
a Type II error than ana of 0.01. 


mmm ea ee eee ee ee 


less. You are more likely to reject the null hypothesis even if it is true. 


Ana of 0.05 will cause (more/less) risk of a type I error than an a of 0.01. 


ei i eee ee, ee 


more 
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65. 


66. 


67. 


68. 


A large sample will result in relatively less risk of a type II error; § against a 
given alternative will be greater with an n of (35/100). 


If the populations have relatively large standard deviations, } will be relatively 
greater. In which situation is the risk of a Type II error greater: 


(a) o appears to be about 2. 
(b) o appears to be about 10. 


(b) 


All other things being equal, the power of a statistical test will be greater with 
an a of (0.05/0.01). 


All other things being equal, the power of astatistical test will be greater with an 
n of (30/75). 


All other things being equal, the power of a statistical test will be greater if the 
standard deviations of the data appear to be in the range of (5/25). 
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69. 


70. 


71. 


You are screening anticancer drugs for possible beneficial effects. You know 
that none of them is harmful but you are uncertain whether they do any good. 
If your results are significant for a given drug, it will be the subject of further 
intensive investigation. If your results are not significant, the investigation will 
be dropped. Which type of error would you be more concerned about avoiding 
(Type I/Type IT)? 


Type II. This error would result in discarding a useful drug. A Type I error 
would result only in wasted effort in further investigation. 


In this situation you would probably choose an a of (0.05/0.001). 


You are investigating a theory that runs completely counter to all accepted 
theory in your field. In some preliminary studies with small samples you 
obtained results significant at the 5% level. You have now designed an experi- 
ment with a very large sample. You have refined your measurement techniques 
so that you believe the standard deviations of your data will be relatively small. 
You will probably choose an a of (0.05/0.01). 


0.01. In this case you are most concerned about avoiding a Type I error. 
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REVIEW PROBLEMS 


If you have successfully completed this chapter you can now set up a formal 
statistical test of a theory. You can: 


state a null hypothesis and an alternative; 

establish a critical region for p to test a hypothesis about P; 

establish a critical region for x to test a hypothesis about p; 

comment on the practical importance of Type I and Type II error in given 
situations. 


Now try these review problems. Table I on pp. 253-256 lists any formulas you 


may need for reference. 


1. 


‘‘We conducted an experiment based on our theory and the results were statis- 
tically significant at the 1% level. Therefore our theory is conclusively proved.” 
Comment on this assertion. 


‘For this proposed experiment the probability of a Type I] error appears to be 
so high that it is not worth the effort to collect the data.’’ What does this mean? 


A clairvoyant claims he can predict whether a coin will land heads or tails. Inan 
experiment that tested his claim, a coin was tossed 10 times and he predicted 
correctly eight times. Outline an appropriate statistical test. Are these results 
significant at the 5% level? 
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4. How often would you expect to obtain eight or more correct predictions by 
chance in the experiment in problem 3? 


5. The mean height of adult males in a particular ancient culture was 5ft 2 in. with 
a standard deviation of 2in., as determined by measurement of a large number 
of skeletons found in burial sites. A new site has been found that differs some- 
what from the others. The discoverer theorizes that the skeletons at this site are 
from a different racial stock, with a different mean height. Outline an appro- 
priate statistical test with a 1% significance level. 
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Answers 


To review a problem, study the frames indicated after the answer. 


1. 


The statistical test indicates that the results are not likely to have occurred as 
a result of the chance variations involved in sampling. You would, however, 
have to examine the design of the experiment carefully to be sure that the ex- 
perimenter’s theory is the only reasonable explanation of the results. Even ifthe 
experimental design were perfect, there would be a 1% chance that the theory 
would not be correct. See frames 26, 31, and 41 to 71. 

Even if the theory were true, you are not likely to obtain results that would allow 
you to reject the null hypothesis. You are unlikely to obtain significant results. 
See frames 41 to 71. 

Let us call a correct prediction a success. On the average one should predict 
correctly half the time just by chance. 


Null hypothesis P= 0.5 


Alternative P > 0.5 
Significance level a = 0.05 
Critical region p2% 


Since p = *, the results are not significant. See frames | to 31. 
The probabilities in the binomial probability table are 


8 0.044 
9 0.010 
10 0.001 

0.055 


You would expect eight or more successes 5.5% of the time. See Chapter 2, 
frames 52 to 71. 


Null hypothesis uw = Sft 2in. 

Alternative uw x% Sft 2 in. 

Significance level a= 0.01 

Critical region z < —2.580rz 2 + 2.58 


See frames 32 to 40. 


CHAPTER FIVE 
Differences Between Means 


This chapter is about statistical tests we can use to investigate the difference 
between the means of two sets of observations. You will learn about two situations. 
One is the case where your observations come in pairs. For example, you may have 
“before” and “after” observations, or you may have observations of situations that 
are carefully matched except for one factor whose effect you are investigating. The 
other case is where you have two samples selected at random. 

In each of these cases, we can use a Statistical test to see if the differences can be 
explained by chance variations or if we may reject this null hypothesis and say that 
the differences support an experimental hypothesis. 

When you have completed this chapter, you will be able to: 


test the significance of a set of difference scores; 

test the significance of the difference between the means of two independent 
samples; 

recognize when you have independent samples and when difference scores, and 
apply the appropriate test procedure to each. 


DIFFERENCE SCORES 


Difference scores are the result of taking two related observations, for example “ twin 
A” and “twin B,” “before” and “after.” When dealing with paired data of this sort, 
it is important to remember that the statistical analysis is based on the differences 
between the paired scores. 


1. One common type of experimental study is the “before” and “after” study. 
Let us consider an example. We wish to test the theory that people will lose 
weight if they stick to a diet of grapefruit and whole wheat toast. To test this 
theory we select a sample of 49 people, weigh them at the start of our study, 
and weigh them again after a period of dieting. For each person we subtract 
the “before” weight from the “after” weight to obtain a difference score; for 
example, a person who weighs 125 Ib at the start of the study and 


118 lb at the end of the study will have a difference score of 


5 


DIFFERENCE SCORES = 127 


2. A person who weighs 140 lb at the start of the study and (because he loves 


whole wheat toast) 150 lb at the end of the study will have a difference score 
of 


cme eee ee = 


If on the average the diet makes no difference, the mean difference score for 


the population is 


zero 


If, on the average, people lose weight on this diet, the mean difference score is 


less than zero (negative) 


What is an appropriate null hypothesis for a statistical test of our theory that 
people will lose weight on this diet? 


The mean difference score for the population is zero. 


What is an appropriate alternative? 


The mean difference score for the population is less than zero. 
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7. 


If we can construct a theoretical sampling distribution based on the null hypo- 
thesis, we can make a statistical test of our theory. Let us consider what the 
population of difference scores would be like if the diet had no effect. We have 
said that the mean difference score would be zero. Does this mean that the 
difference score of every person who adheres to the diet will be zero? 


No. Some will lose weight for reasons that have nothing to do with the diet. 
Others will gain weight. Among a sufficiently large number of dieters these 
variations should tend to cancel out to produce a mean of zero. 


In the preceding problems in which you tested a hypothesis about the mean 
your null hypothesis was based on a population whose mean and standard 
deviation were known. In this case the null hypothesis 1s based on a theoretical 
analysis. We can say that if the diet has no effect the mean of a population of 
difference scores should be zero, but we have no basis in theory for saying what 
the standard deviation of this population should be. Instead, we must use 
the sample to provide us with an estimate of a. What statistic will we use? 


comm i ei = ee Ce Ce = 


As you recall, og = o/\/n. 
Since we do not know a in this case, our best estimate is o; = 


wee ee ei ei ee "ee 
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10. 


11. 


12. 


With this estimate of o¢, we can compute a z score and test our hypothesis. 


Null hypothesis u=O0 
Alternative u<O 
Significance level a = 0.05 

Critical region z < —1.65 


Our sample provides the following statistics: 


n= 49 

x = —5.0 

s = 35 

Compute z = —_—* = eee ts 


am i a 


The results are significant. 


In this example we used s as an estimate of a. This estimate is good only if the 


sample size is 


larger than 30 


Let’s review this hypothesis-testing procedure. According to the null hypo- 
thesis in this case, 


meee ia ee ee SC 
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13. 


14. 


15. 


Using a significance level of ~ = 0.05, we established a critical region of 
z < —1.65. If the null hypothesis is true and we repeat the same experiment a 
great number of times, how often would we expect to obtain asample mean with 
a z score in the critical region? 


ee 


5% of the time 


If the alternative u <0 were true, we should expect to obtain a sample mean with 
a z score in the critical region (more/less) often. 


More 


A different approach to the same problem would be to useX ands to establishacon- 
fidence interval for u. The formula for the confidence interval is 
x +2,—= -—5+ fo ecoae —5 + 0.825 
vn v49 


uw is between — 5.825 and —4.175 with 90% confidence. 


Since cases in which w ts even smaller than — 5.825 also confirm the theorv, youcan 
say that uw < —4.175 with 95°% confidence. Clearly u < 0 with at least 95°% 
confidence. An X that is exactly on the boundary of the critical region (z = — 1.65) 
will give exactly 95% confidence that uw < Q. 


For samples larger than 30 the method we have used is valid. For smaller 
samples we can use the same method with the ¢ distribution. provided we can 
assume that the population of difference scores is approximately normally 
distributed; for example, a psychologist believes that a particular type of 
memory training will influence the ability of children to remember nonsense 
syllables. He tests a class of 25 children with a list of nonsense syllables before 
training and with an equivalent list after training and he finds the difference 
score for each child. Before he can use a f test on his results, he must check the 


assumption that the difference scores are 
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16. 


17. 


approximately normally distributed. 


The mathematical derivation of the t tables assumes that the population is 
normally distributed. In practice a t test will usually lead to the correct 
conclusion even when the distribution is fairly different from a normal distribu- 


_ tion, provided it is not strongly skewed. 


If he believes that the distribution of the difference scores is likely to be very 
different from a normal distribution, what can he do in planning his experiment 
to ensure that he will be able to make a statistical test of his results? 


aii — ai a ee ae SC —=ees Bd s 


Use a larger sample 


Let us assume that the difference scores tend to be normally distributed in this 
example. Outline an appropriate statistical test below. Use a significance level 
of 1% and consider both ends of the sampling distribution. Use the ¢ table on 
page 270 to establish the critical region and remember that df = n — 1. 


Null hypothesis 
Alternative 
Significance level 


Critical region 


meee ea ee Ce 


Null hypothesis u= 0 


Alternative u #0 
Significance level a = 0.01 
Critical region t > +2.80 orr < —2.80 
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18. Reference formula 


(ea 
s/\/n 


The experimenter collects the following data: 


The mean difference between pretest and posttest is — 5 points. The standard 
deviation of the sample of difference scores is 20 points. 


Are the results significant? 


Nofi = = 2 5 
20/\/25 4 


19. Here is the computer printout of another statistical test: 


PAIRED SAMPLES T TEST 


MEAN T 2-TAIL 
VARIABLE N MEAN SD DIFF SD VALUE PROB 
PRE 164.3. 21.5409 
10 3.3 0.9434 3.49799 .006 
POST 161.0 21.0079 


What is the mean difference between PRE and POST? 
What is the probability of such a result by chance? 
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20. 


Difference scores need not always be before and after scores; for example, 
consider the following situation: 


You are studying the effect of environment on intelligence test scores. You are 
fortunate enough to locate five sets of identical twins one of whom was raised 
in an institution and the other adopted and raised in a family setting. You test 
all the twins and obtain the following scores: — 


Pair No. Raised in family —_Institutionalized Difference 
l 105 95 — 10 
2 95 83 —12 
3 103 103 0 
4 98 96 —2 
5 103 97 —6 


Can you reasonably assume that these difference scores are normally dis- 
tributed? There is no clear evidence that they are not normally distributed and 
most researchers would accept the use of af test. You should remember, how- 
ever, that you are making this assumption on very limited evidence. Outline 
an appropriate statistical test for the theory that the difference in environment 
makes a difference in the intelligence test scores of these identical twins. Use a 
significance level of 5%. The ¢ table is on page 270. 


Null hypothesis 
Alternative 
Significance level 


Critical region 


a 


Null hypothesis w= 


Alternative u #0 
Significance level a = 0.05 
Critical region tr > +2.78 ort < —2.78 
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oe _ f&a(x — XP 
21. aa a S = a ne ee 


To compute ¢ you will need to know the mean and standard deviation of the 
sample. Using the reference formulas above, compute X and s. (You will find 
a square root table on page 257.) 


—_— 
— 
— 


5S = 


22. Complete the ¢ test by computing ¢. Is the difference between the twins 
Statistically significant? 


ei eee eee el ieee Ces 


X¥—u_ -60-0_ —-6_ 46, 


{= —— eee ee ee 
s/\/n S/Y5 2.3 


The difference is not statistically significant. Your statistical test does not 
support the theory that environment affects intelligence test scores. 
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23. 


In this experiment the result is not significant, but it is extreme enough that it 
would not occur by chance as often as 10% of the time. The sample size is small. 
What would you say about the chance of a Type I or Type II error in this case? 


cee ee es ee 


The chance of a Type II error is substantial. This statistical test may well have 
led us to reject a true theory. 


THE DIFFERENCE BETWEEN TWO INDEPENDENT 
SAMPLE MEANS 


The statistical analysis appropriate for comparing two independent samples is not 
the same as the procedure you have just learned for testing difference scores. 


24. 


25. 


A common type of experimental study involves the comparison of an experi- 
mental sample and a control sample; for example, an experimenter interested 
in the effect of a plant hormone on the growth of beanstalks might treat 
alternate rows of beans with the hormone, leaving half the plants untreated. 
She will measure the height of each beanstalk and then use a statistical test to 
determine if the difference in mean height between the two samples of 
beanstalks is significant. This study does not involve the kind of individual-by- 
individual matching that was used in the difference score studies. Selecting one 
bean for the treated group does not give you any information about what 
beans will be chosen for the other group. 

Does selecting one dieter’s “before” weight tell you anything about the 
composition of the “after” sample? 


ee ee 


Yes. The “after” sample will include the same person. 


In which problem would you say the two samples are independent? 


(a) The bean problem 
(b) The dieting problem 


(a) The bean problem 
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26. 


27. 


For the bean problem there is a statistical test we can use to test the null hypo- 
thesis that two independent samples come from populations with the same 
mean. Let u, be the mean height of the treated plants and uw, , the mean height of 
the untreated. We would state the null hypothesis this way: 


Null hypothesis fy = My 


How would you state the alternative? 


ee eee ee =e — 


If the experimenter has a theory that the plant hormone will stunt the growth 
of the beanstalks, how would you state the alternative? Remember that uw, isthe 
treated plants. 


If both samples are large, we can compute az score for the difference between 
means. If the samples are small, we can use the same techniques tocomputeazs 
score, provided certain assumptions are true. We will not explain the mathemat- 
ical derivation of the formulas you will use. The end result is az score orf score 
which you can compare with the appropriate table. Let us start with the case of 
large samples. The experimenter has a theory that predicts that the hormone 
will stunt the growth of beans and she will consider only one end of the 
sampling distribution. She will use a significance level of 0.01. Her samples 
contain 100 beanstalks each. Outline an appropriate statistical test. 


Null hypothesis 
Alternative 
Significance level 


Critical region 


Null hypothesis (i hy 
Alternative iy < My 
Significance level = 0.0] 
Critical region z = —2.33 
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The formula used for computing a z score for the difference between two 
means 1S 


xy — X, 


V5,7/N, + S2?/Ny 


—— 


For the following data compute a z score: 


Sample | Sample 2 
_ (treated) (untreated) 
n, = 100 n, = 100 
X=: 27 in: X,= 29 in. 
s, = Jin. Ss = 4in. 
Z= 3.V3 
= tee De rr rs se 3.13 


VY 57/100 + 47/100 =\/25/100 + 16/100 0.4 


The experimenter’s theory is supported by the statistical test. Treated 
beanstalks are significantly shorter. 


138° DIFFERENCES BETWEEN MEANS 


30. Reference formula. 
V5,°/n, + Sy?/n, 


A manufacturer suspects a difference in the quality of the spare parts he 
receives from two suppliers. He obtains the following data on the service life of 
random samples of parts from the two suppliers: 


zZ 


Supplier A Supplier B 
n, = 50 n, = 100 
x, = 150 x) = 153 
s, = 10 s, = 5 


Outline an appropriate statistical test, using the 1% significance level, and 
compute z. Is the difference between the two samples statistically significant? 


ee i = = 


Null hypothesis ul, = fy 


Alternative fy FM, 
Significance level a = 0.01. 
Critical region z < + 2.58 0rz 2 — 2.58 
z = 2.0 (not significant). 
_ 1S0 — 153 _ —3 a 
a SS —S OOS ee 15 =—_ 2.0 


/107/50 + 8/100 \/24 025 


The two suppliers’ parts are not significantly different. 
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31. 


32. 


33. 


To use this type of test with small samples two assumptions must be made: 


(a) The two populations involved are normally distributed. 
(b) The populations have equal standard deviation; that is, 0, = ‘0. 


Since the populations are unknown, you can never be certain if these assump- 
tions are valid, but you can make some judgments about their reasonableness by 
looking at the data; for example, consider the following cases: 


Case |. You are testing for a significant difference between the following 
samples: 


Sample | 1, 4, 6, 8, 10, 11, 12, 14, 16, 18 
x = 10.8 = 5.2 

Sample 2 2, 5, 6, 8, 11, 13, 14, 15, 17, 20 
5 ae — ae ee) ee ee 


Case 2. You are testing for a significant difference between the following 
samples: 


Sample 1 1, 4, 6, 8, 10, 11, 12, 14, 16, 18 
x = 10, 5= 5.2 

Sample 2. 1, 1, 2, 2, 2, 3, 5, 10, 24, 38 
xX = 8.8,5 = 12.4 


Which case better matches assumption | (case I/case 2)? 


mee iw eC 


case |. Sample 2 in case 2 is skewed. 


Which case better matches assumption 2 (case 1/case 2)? 


mee ew ww iw i w= ew 


case |. The values of s for the two samples in case 2 are quite different; one value 
is more than twice as large as the other. 


Using f to test the significance of the difference between the means of two inde- 
pendent small samples involves two assumptions: 


(a) Both distributions are 
(b) The population standard Gey one: are 


normal 
equal 


In practice, this t test will usually lead to the correct conclusion provided the two 
samples are equal in size and neither of the distributions is strongly skewed. 
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35. 


The formula used for computing a ¢ score for the difference between two 
means 1S 


x Six 


Vs/n, + 2/n, 


In this formula s?is a pooled estimate of the population variance (a7), based on 
the s of both samples combined. The formula for computing s? is 


= (n, — 1) 5,7 + (ty — 1) 5, 
n+n,—2 


For the following data compute f score: 


Sample |! Sample 2 
n= 10.0 n= 10.0 
x= 10.0 x= 11.1 
oS 52 s= 5.7 


~~ 10+10-2 °»+;}18 aa 2 
10.0 — 11.1 es | af es ee 


ee —_——_ eee ee 


VY 29.765/10 + 29.765/10 39. 53/10 5.95 2.44 


As you recall, when using the ¢ table you must take into account the “degrees 
of freedom.” When you use only one sample, df = n — 1, but when you have 
two samples you must add the degrees of freedom for each sample. Then 
df = (nm, — 1) + (nm, — 1) or, expressed in a simpler form, df = 1, + n, — 2. 
What are the degrees of freedom for the problem in frame 33? 


mew i nei i ee se 


18 (10 + 10 — 2) 


For the following statistical test, establish a critical region for the problem in 
frame 33 (df = 18). 


Null hypothesis i, = My 
Alternative “a, # MB, 
Significance level a = 0.05 


Critical region 


t > +2.10 orr < —2.10 
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37. When you use a computer to calculate a ¢ test on the difference between two 
sample means, the result will look something like this: 


T-TEST 
T 2-TAIL 

N MEAN SD VALUE DF PROB 

GROUP 1 25 83.84 13.31 
3.50 36 0.00 

GROUP 2 13 67.31 14.77 
Look at the printout. What 1s the value for degrees of freedom? ____—__—is it 
correct? (Yes/No) Are the results of the test significant at the 5% level? 
(Yes/No) 
df = 36 


Yes, it 18 Correct. 
Yes, the results are significant. 


CHOOSING THE RIGHT TEST 


Using an inappropriate statistical test will give misleading results. To select an 
appropriate statistical test you must consider two questions: 


(a) What are your null hypothesis and alternative? 
(b) What assumptions and limitations apply to the statistical tests you might use? 


38. Let us look at some examples. A reading test is supplied by a national 
publisher. The test is accompanied by a manual which describes the results of 
the administration of the test to some 4000 tenth-grade students in various 
parts of the country. According to this manual, the distribution of scores on 
the test was approximately normal, with a mean of 30, and a standard 
deviation of 5. You wish to know if the performance of students in your city 
differs significantly from that of the 4000 students described in the manual, so 
you select a random sample of 25 tenth-grade students and test them. What are 
your null hypothesis and alternative for the statistical test? 


Null hypothesis 


Alternative 


miei i = i w= ee = 
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39. 


41. 


Null hypothesis u = 30 
Alternative u # 30 


The form of your null hypothesis tells you what statistical tests are available. 
Which of the following corresponds to your null hypothesis in this case? 


(a) ys = C, where C is a known constant that you could state before the experi- 
ment; for example, « = 0 oru = 10. 
(b) «, = m,, where uw, and uw, are both estimated on the basis of samples. 


en eee 


(a) w=C 


Look at the summary of formulas on page 253. What two formulas are listed 
for testing the hypothesis p = C? 


meee ei ia ea ei ee i —- - 


Pret Sew 
slh/n 
x—C 


The form of your alternative tells you whether your test is “one-tailed” or 
“two-tailed.” A two-tailed test is a test in which you take into account both 
ends of the sampling distribution when you set your critical region. It corre- 
sponds to an alternative such as p # C. A one-tailed test is concerned with 
only one end of the sampling distribution. It corresponds to an alternative such 
as pp > C or p < C. Consider your alternative in this problem. What type of 
test will you use? | 


cme ei eee ee Cee ee Ce 


Two-tailed. You will reject the null hypothesis if the sample mean 1s either un- 
usually low or unusually high. 


42. 


43. 
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Now ask yourself the second question: ‘‘What assumptions and limitations 
apply to the statistical tests you might use’’? Use of az score involves an assump- 
tion about sample size. What is it? 


ee 


The sample size is greater than 30. 


Using a t score involves an assumption about the distribution of your unknown 
population. What is it? 


mee ae ee a ae a sl ee 


The population 1s approximately normally distributed. 


The sample size in this problem is 25. Suppose you test the 25 students and find 
that their scores appear to be roughly normally distributed, with x = 26 and 
= 6. What will you use for a statistical test? 


(a) az score 
(b) atscore 
(c) neither 


(b) adzscore 


144 DIFFERENCES BETWEEN MEANS 


45. Using a 1% significance level, complete the statistical test: 


Null hypothesis u = 30 

Alternative uw # 30 

Significance level a= 001 

Critical region (2 es OP tS 
f=. The results are (significant/not significant). 


See ee eee 


Critical region; ¢ => +2.797 ort < —2.797 (df = 24) 
eee ee 
6/\/25 6/5 6 


Significant 


Your sample does differ significantly from the sample described in the 
manual. 
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46. Using the same reading test with a sample of 25 students from a different city, 
you find that their scores are bimodally distributed, as indicated in the 
illustration. The city has a large minority of students who have learned 
English as a second language. 


Number of students 


Score 


Again, you wish to know if the mean performance, of all students in this city 
differs significantly from that of the 4000 students described in the test manual. 
What will you use for a statistical test in this case? 


(a) azscore 
(b) adtscore 
(c) neither 


(c) neither. In this case you cannot assume that the sample comes from a nor- 
mally distributed population, and the sample is too small to use a z score. One 
solution would be to obtain a larger sample so that a z score could be used. 
Perhaps a more appropriate solution would be to treat the students who learned 
English as a second language as a separate population and draw two separate 
samples. 
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47. All tenth-grade students in your city have been tested. This population 
consists of 3000 tenth-graders. Their reading scores are distributed as shown 
below. A group of 100 from this population has been selected for a special 


Percent 


Score 


enrichment program and you suspect that the selection procedures favored the 
better readers. To check your theory you obtain the reading scores of the 100 
students selected for the special program. What are the null hypothesis and 


alternative? 
Null hypothesis = 26 
Alternative “> 26 


48. Which of the following corresponds to your null hypothesis in this case? 
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49. 


51. 


The form of the null hypothesis indicates that statistical tests are available with 
z scores or t scores. The form of the alternative indicates that the test will be 
(one-tailed/two-tailed). 


one-tailed 


What assumptions and limitations apply to the available tests? For a z score? 
For a ¢ score? 


For a z score n must be greater than 30. 
For a ¢ score the sample is assumed to have come from a normally distributed 


population. 


Considering these assumptions, what will you use for a statistical test in this 
case? 


(a) azscore 
(b) at score 
(c) neither 


(a) a z score. The sample is large enough to use z. 


Although the population is not normally distributed, most researchers would 
probably accept conclusions based on at score for a smaller sample in this case. 
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52. Theoretical analysis leads a researcher to predict that the trout in one lake 
(call it Deep Lake) will have higher concentrations of chlorinated hydro- 
carbons in their livers than the trout in another lake (Blue Lake). He catches 
10 fish from each lake. What are the null hypothesis and alternative? 


Null hypothesis u, = My, 
Alternative uu, > py 


53. Look at the summary of formulas on page 253. What formulas are listed for 
testing this hypothesis? 


ee i i i nil 


xX; ae ce Ns 


aS t= —— 
/5,°/n, + Sy?/n, V/s?/n, + s?/n, 


54. The statistical test will be (one-tailed/two-tailed)? 


one-tailed 


55. 


37. 
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What assumptions apply to the z score test in this case? 


Both samples are larger than 30. 


What assumptions apply to the ¢ score test in this case? 


The populations are normally distributed and have equal standard deviations. 


The two samples appear to be more or less normally distributed. The following 
amounts of chlorinated hydrocarbons are detected: 


Sample | (Deep Lake) Sample 2 (Blue Lake) 


x = 38 x = 4 
S = 3 c= 2 
n= 10 n= 10 


Review the problem as outlined in frame 50 and the following frames and 
complete an appropriate statistical test. Do these results confirm the re- 
searcher’s theory? (Use the 1% significance level.) 
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59. 


Null hypothesis “, = wy 
Alternative | My > Me 
Significance level aw = 0.01 
Critical region tr 2 +2.55 (df = 18) 
2 2 — 
ge = 227 + GY 6.5 jpg a, BS 


18 V1.3 


The researcher’s theory is confirmed. The ¢ test is used because the samplesare 
small and s, and s, are not drastically different. 


It is important to distinguish between independent samples and samples that 
are actually paired data and should be used to compute difference scores; for 
example, a statistically minded shopper wants to compare the prices of the 
two supermarkets in his neighborhood. From a list of all types of canned and 
frozen goods that are available he selects at random 43 different items. He 
then determines the price of each of these items at each store, so that he has a 
list of 43 prices for each of the two stores. Do his two lists of prices constitute 
independent samples or should he compute difference scores? 


He should compute difference scores, since he priced the same items at each 
store. 


What are an appropriate null hypothesis and alternative for a statistical test? 


Null hypothesis u = 0 
Alternative u #0 


What formula should he use for a statistical test? 
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61. 


62. 


An economist is investigating price differences between suburban and central 
city stores. He determines the total price of a standard shopping list ata random 
sample of 18 suburban stores and 22 central city stores, so that he has 18 
shopping-list totals from suburban stores and 22 shopping-list totals from 
central city stores. Has he two independent samples? 


Yes. Selecting a given suburban store has no influence on what central city stores 
will be selected. 


Outline an appropriate statistical test. 
Null hypothesis 


Alternative 


Significance level = 0.05 
Critical region 
Formula for test 
Null hypothesis uy = My 
Alternative a, # My 
Significance level uw = 0.05 
Critical region t 2 +202 o0rt < —2.02 ( df = 38) 
Formula for test Se 

Vs*/n, + s?/n, 

es (1, = 1) s\° es (n, = 1) s,° 


nj+n,— 2 


A last caution about one-tailed versus two-tailed tests. A one-tailed test is appro- 
priate only when you have clearly decided in advance of the experiment that results 
in the opposite direction from your alternative are of no interest whatsoever. You 
cannot look at the data and then decide to use a one-tailed test. For this reason 
many researchers always use two-tailed tests even when they have formed a tentative 
conclusion about the direction of the results. When in doubt, use a two-tailed test. 
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REVIEW PROBLEMS 


If you have successfully completed this chapter, you can now set up formal statistical 
tests of hypotheses about differences between the means of two sets of data. You 
can: 


@ test the significance of a set of difference scores; 

@ test the significance of the difference between the means of two independent 
samples; 

@ recognize when you have independent samples and when paired data ‘which 
result in difference scores. 


Now try these review problems. Table I on pp. 253-256 lists any formulas you 
may need for reference. 


1. You are studying problem-solving performance, using time as a measure. With 
the kind of problem you are using the distributions you obtain are typically 
bimodal. Your subjects either solve the set of problems quickly or take a long 
time, but only rarely do they take an intermediate amount of time. You select a 
sample of 10 subjects and give them instruction that you believe will reduce 
their mean time for solving the problem set. You wish to compare their 
performance with that of another sample of 10 who did not receive the 
instruction. Outline an appropriate statistical test and suggest changes in the 
plan of the experiment if necessary. Use a 5% significance level. 


2. A laboratory owns two precision measuring devices. The director suspects that 
there is a slight difference in calibration between the two, so that one of them 
(he doesn’t know which) tends to give slightly higher readings than the other. 
He proposes to check the two devices by taking readings of 50 objects on both 
machines. Thus he will have readings of objects 1-50 on Machine A and 
readings of objects 1-50 on Machine B. Outline on appropriate test at the 5% 
significance level and suggest changes in the plan if necessary. 
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From appropriately selected samples two sets of IQ scores are obtained as 
summarized below: 


Group | Group 2 
n= 16 n= 14 
x = 107 x = 112 
s = 10 s= 8 


Is there a significant difference between the two groups? Use the 5% significance 
level. 


Answers 


To review a problem, study the frames indicated after the answer. 


l. 


You cannot perform a statistical test as the experiment is planned. The samples 
are too small to use a z test, and to use a f test you must be able to assume that 
the populations are approximately normally distributed. Your experience 
indicates that you cannot make this assumption. The solution ts to use samples 
of 30 or more. With larger samples the statistical test is the following: 


Null hypothesis Uy = My 
Alternative My < My 
Significance level a = 0.05 
Critical region z < —1.65 


The formula for z is 
¥, — X 
See frames 24 to 37 and 38 to 62. 


You must use difference scores in this case because the samples are not indepen- 
dent. Since you have 50 difference scores, use z. 


2. = 


Null hypothesis uw = 0 

Alternative u #0 

Significance level a = 0.05 

Critical region z s -1.960rz 2 +1.96 


The formula for z is 


Xx—-uw xX-O_ 


X 
sl\/n — s/\/50 7 s/\/50 


Lo 


See frames 1 to 23 and 38 to 62. 
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3. The samples are small, so you must use ¢. In the absence of evidence to the 
contrary it is reasonable to assume that the populations of [Q scores are approxi- 
mately normally distributed. 


Null hypothesis i, = py 

Alternative a, # My 

Significance level a = 0.05 

Critical region t < -—2.05 ort = +2.05 


(df = 16 + 14 — 2 = 28) 


The formula for f 1s 


4 ee 

/s?/n, + s*/n, 
pis (n, — 1)s,? + (n, — 1)8,? 

n+ n, — 2 

15(100) + 13(64) 1500 + 832 
2 _ SS ee 
16 + 14-2 28 aoe 
a 107 — 112 is -—5 = =) — —].49 


/83.3/16 + 83.3/14 V5.21 + 5.95 3.35 


The difference between the two groups 1s not significant. See frames 31 to 37 
and 38 to 62. 


CHAPTER SIX 
The Difference Between Two 
Variances or Several Means 


The difference between two variances can be studied using another sampling 
distribution called the F distribution. The approach should, by now, be familiar. 
You compute a value for F using the data from your samples and use the sampling 
distribution table to determine if the value is in the critical region. 

A related technique called analysis of variance allows you to consider data from 
several samples at the same time. You try to distinguish systematic differences 
between sample groups from the chance variation found within each group. 
Analysis of variance requires you to compute an F value. 

When you have completed this chapter, you will be able to: 


@ test the significance of the difference between two sample variances; 

@ perform an analysis of variance to determine if the differences among the 
means of a group of samples are statistically significant; 

@ determine if given data meet the assumptions of these tests. 


COMPARISON OF TWO VARIANCES 


Just as two means can be compared by using the sampling distribution of ¢, two 
variances can be compared by using the sampling distribution of F. The procedure is 
very simple. | 


> 
° 


The variability of a population or a sample can be described by its standard 
deviation or by its variance. The variance is simply the square of the standard 
deviation; for example, if o for a population is 6, the variance of the population 
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The formula for the F ratio is 
2 


The usual way to compare the variability of two samples is to divide the larger 
variance by the smaller variance. The result of this computation is called an 
F ratio; for example, if the variance of one sample is 25 and the variance of © 
another sample is 4, the F ratio is 25/4 or 6.25. If the variance of one sample 


is 36 and the variance of another is 9, the F ratio is 


36/9 = 4.0 
4. If one sample has a variance of 5 and the other a variance of 25, the F ratio 


25/5 = 5.0 


5. An F ratio close to 1.0 indicates that the two samples have (similar/different) 
variances. 


similar 


6. Two samples with very different variances would result in an F ratio that is 
(close to 1.0/large). 


7. In computing an F ratio, you normally divide the (larger/smaller) variance by 
the (larger/smaller) variance. 


larger 
smaller 
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8. 


10. 


If two samples are drawn at random from populations with the same variance, 
their F ratio is most likely to be close to 1.0. A large F ratio will be relatively 
unlikely. For normally distributed populations it is possible to construct 
theoretical sampling distributions for F. This fact can be used to perform a 
statistical test of a hypothesis about the difference between the variance of two 
samples; for example, a researcher believes that stress will increase the vari- 
ability of a sample’s test scores. He administers the same test to two groups of 
25 college applicants. One group is told that this test will be important in 
determining whether they will be admitted. The other group is told that the test 
is being administered for research purposes only. His statistical test is as follows: 


Null hy pothesis 0, = GO, 
Alternative a, > oO, 
Significance level a=1% 
Critical region F > 2.66 (This is obtained from a table, as you will soon 
see.) 
His results are 
Group | Group 2 
(stress) (no stress) 
re n= 25 
x = 120 x = 110 
s = 20 s = 10 
s? = 400 s? = 100 
Compute F. F = ____. 
F= 40 


meme i i = i n= ae ee 


The tables used to establish a critical region for this test will tell you that a 
given value of F would occur by chance no more than 5% of the time, for 
example, or no more than 1°% of the time. You will find such atable on page 271. 
Look at it. What is its title? 


“Critical Points of the F Distribution” 


1588. THE DIFFERENCE BETWEEN TWO VARIANCES OR SEVERAL MEANS 


Li 


13. 


14. 


To use the F table you must know the degrees of freedom for both s,? and s,’. 
As in the ¢ test, df =n — I. 


As anexample, s,? = 36 55° = 25 
nh. = 15 n, = 21 


The degrees of freedom for s,’ are 14. The degrees of freedom for,” are 


20 


Look at the F table and locate the area on the table that corresponds to 14 
degrees of freedom for s,? and 20 for s,?. You should find two values of F 
in this area. What are they? 


2.23 and 3.13 


According to the table, how often would you expect an F of 3.13 or greater 
to occur by chance with these sample sizes? 


1°” of the time 


5% of the time chance would produce an F greater than _____ 


Zd3 


Assume for a moment that you are operating a frozen grapefruit juice factory. 
To buy fruit at favorable prices you must buy the grower’s entire crop, but 
unusually large or unusually small fruit can jam your machinery and must be 
eliminated before squeezing. For this reason you would like to buy crops that. 
are relatively uniform in size; that is, you prefer crops with a (large/small) 
variance, 


16. 


17. 


18. 
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Two growers offer you their crops. Grower A asks a slightly higher price than 
Grower B, but he says that his grapefruit are more uniform in size. To check 
this assertion you ask for a random sample of each crop. Each grower sends 
you a crate of 25 grapefruit. You measure the grapefruit in each sample and 
obtain the following information: 


(a) The size of the fruit is approximately normally distributed for both 
samples. 

(b) For Grower A the mean diameter of the fruit is 4.5 in., with a standard 
deviation of 0.5 in. 

(c) For Grower B the mean diameter of the fruit is 4.5 in., with a standard 
deviation of | in. 


State an appropriate null hypothesis and alternative for a statistical test. 


Null hypothesis 0, = 0, 
Alternative 0, < O, 


For a significance level of 5° what is the critical region? (Use the table.) 


F 2 


Compute F. Are the results significant? 


The results are significant. The fruit from Grower A is more uniform in size. 
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19. 


20. 


21. 


22. 


In the cases you have considered so far the alternative called for a one-tailed 
statistical test; that is, your theory specified which of the two variances would 
be larger. Sometimes your alternative is two-tailed. It says only that o, # a, 
without stating which one will be larger. In this case you will use whichever 
variance happens to be larger as the numerator of the F ratio but you must 
double the probabilities when you use the F table. The 5% values become 10% 
and the 1% become % Values. 


2% 


You have been given the following statistics on two samples. From experience 
you are quite certain that the populations are normally distributed but you 
suspect that the two populations are not equally variable. 


Sample | Sample 2 
n= 10 n= 10 
x= 88 5 a ad 
aoe s=- D> 


What would be an appropriate null hypothesis and alternative for investigating 
this matter? 


Null hypothesis 0, = Oy 


Alternative 0, # 0, 


Using a 2% significance level, establish a critical region. 


Critical region F => 5.35 


Compute F. 
12? 144 
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23. 


Will a ¢ test be appropriate in this case? 


No. The difference between the variance of the two samples 1s exceptionally 
large. It is not likely that they came from populations with the same variance. 


ANALYSIS OF VARIANCE 


The F ratio can be used to test the null hypothesis that a number of samples all 
come from populations with the same mean. The procedure used for this test is 
called analysis of variance. 


24. 


25. 


26. 


In the analysis of variance we use the difference among sample means to 
estimate the variance of the population. We also make a separate estimate of 
the population variance based only on the differences among individuals 
within each sample. If the samples all come from populations with the same 
mean, the differences between sample means will be relatively (large /small). 


small 


If the differences between sample means are relatively large, we will conclude 
that the mean of the populations (is/is not) the same. 


The estimate of variance based on the differences between the means of groups 
is called ‘‘between groups variance.” The estimate of variance based only on 
the differences between individuals is called ‘“*within groups variance”’ or “‘error 
variance.” To reject the null hypothesis between groups variance must be large 


compared with 


within groups variance or error variance 
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27. Let us see how between groups variance and within groups variance can 
be estimated. The purpose of this explanation is to show you the basic logic 
of the test. You will learn simplified procedures for computation later. A 
researcher believes that the color of a toy will affect how long children will 
play with it. From a population of preschool children he obtains four sample 
groups of !0 children each. Using the same stuffed animal, but in a different 
color for each sample group, he observes how many minutes each child in each 
sample group spends playing with the toy during a 10-minute session. His null 
hypothesis is that uw, = uw, = “, = w,. The alternative is that not all the means 
are equal; that is, that the color of the toy does make a difference. His data look 


like this: 

Group | Group 2 Group 3 Group 4 
(red giraffe) (yellow giraffe) (green giraffe) (blue giraffe) 
l 3 2 5 
2 n, = 10 2 Ny = 10 4 n, = 10 3 on, = 10 
5 6 2 | 
t “ee 3A + XS. 50 1 x¥,= 2.4 2 X= 25 
6 2 2 | 
i soe 45 S <= 5.6 > so = AZ 3° 35S 18 
2 7 4 4 
2 5 | 2 
4 6 3 3 
4 8 2 l 


All groups combined 


N = 40 
A= 3.3 
co = al 


The between groups variance estimate is based on the differences between 
(individuals/group means). 


em nr ee ee i 


group means 


28. Consider the distribution of the four group means. This distribution is 


(a) a population distribution 
(b) asample distribution 
(c) a sampling distribution 


(c) sampling distribution 


PE ST 


29. 


30. 


31. 
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As you recall from our discussion of the central limit theorem, 


that is, the standard deviation of the sampling distribution of x depends on two 
things, o and n. 


What does o stand for? 
What does n stand for? 


o iS the standard deviation of the population. 
nis the sample size. 


At the moment we are interested in computing s’, an estimate of the variance 
of the population. We can find it by using a formula based on the formula above. 
Using s in the place of o to indicate that this 1s an estimate, we can say 
sz = s/\/n. Then sz? = s?/n, and s? = nsx’. What is the size of the samples 
that make up our sampling distribution? n = 


s; 1S simply the variance of the following numbers: 


3.4, 5.0, 2.4, and 2.5. 


What are these numbers? 


The means of the four samples. 
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32. 


33. 


34. 


The variance of the sample means works out to be 1.45. (If you would like 
practice, check it for yourself.) 


On the basis of this information, what is the variance of the total population? 
s = nsx = 


10(1.45) = 14.5 


14.5 1s the (between/within) groups variance estimate. 


between 


The between groups variance estimate is based on the variance of (individual 


scores/group means). 


group means 
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35. 


36. 


The within groups variance estimate is based on the variances of the individual 
scores within sample groups. What are the variances of individual scores within 
the sample groups? 


Soe 
Sy° = 
Sy" = 
SS 
gr 45 
5,’ = 5.6 
Cea es 
o? = c18 
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37. 


38. 


39. 


The variance of each of these groups is an estimate of the total population 
variance. To obtain an even better estimate we can average all four group 
variances. This is our within groups variance estimate. Compute the within 
groups variance estimate for this example. 


ee 


45+ 56+ 1.2 + 1.8 = 


rl 3.28 


Compute the F ratio. 


_ between groups variance _ 


| ee 
within groups variance 
14.5 


To evaluate this F ratio we still need to know the degrees of freedom for the 
between groups and within groups variance. The between groups variance 
estimate is based on four means. Degrees of freedom for this variance are 4 — | 
= 3. The within groups variance is an average of several variances and the 
degrees of freedom for it are the total of the degrees of freedom for each of the 
groups. What are the degrees of freedom for one group of 10? 
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40. 


41. 


42. 


43. 


What are the total degrees of freedom for four groups of 10? 


4x 9= 36 
The degrees of freedom for s,” (between groups variance) are ___. The 
degrees of freedom for s,? (within groups variance) are _____. For a 1% 


significance level the critical region 1s F = 


36 
F > 4.38 


Will the experimenter reject the null hypothesis? 


Yes, because the F ratio is in the critical region. He can conclude that the 
color of the toy does make a difference in its attractiveness. 


In the example you have just considered the sample groups were of equal size. 
When the sample groups are not of equal size, it is necessary to weight the means 
and variances of the sample groups according to their sizes. The computations 
are more easily dune when you use “sums of squares” instead of means and 
variances. A sum of squares is the numerator (upper term) of a variance esti- 
mate. Circle the part of the following formula that represents the sum of 
squares. 


<— sum of squares 
= 


n— | 
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44. 


45. 


The between groups sum of squares is the sum of squares you will use to 
compute the between groups variance estimate. The within groups sum of 
squares is the sum of squares you will use to compute the within groups variance 
estimate. If you consider all the groups combined as a single large sample, you 
can also compute a total sum of squares. To compute the variance of all the 


observations combined as a single sample you would use the sum of 


squares. 


total 


Adding the between groups sum of squares to the within groups sum of squares 
will always give you the total sum of squares. This fact can simplify computa- 
tions; for example, if you know that the total sum of squares is 75 and the 
between groups sum of squares is 25, the within groups sum of squares must 


be__. 


In practical work the mathematical procedures for analysis of variance are 
normally done by computer. To understand the terminology and to appreciate 
what you are missing you nevertheless ought to work through a few problems by 
hand. If you have no tolerance for computation, at least read through the 
explanations of the following problems step by step. You will need this background 
to understand computer output. 
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46. To perform an analysis of variance of data from varying size groups start by 


47. 


48. 


49. 


setting up a table like this: 
Group | Group2 Group 3 Group 4 Total 


x x Ko oe x x 
3 9 Z 4 3 9 5S 25 
3 9 4 16 S 25 6 36 2 = 125e 2 
a 25 4 16 >. “25 6 36 2x, = 199s 
6 36 6 36 6 36 7 49 Nie 222 
8 64 9 81 7 49 o4 
10 100 7 49 
8 64 
25 143 35 253 41 257 24 146 
= n, = 6 n, = 7 n,=4 


From this table you can read all the information you will need to complete the 
analysis of variance. Notice that there are two columns for each group. The 
first column lists the scores for the group. The second column lists 


the scores squared 


Do you see any columns for (x — X) or (x — Xx)? 


What symbol is used to indicate the number of observations in group 1? 


ny 


What symbol 1s used to indicate the total number of observations in all groups 
combined? 
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50. 


She 


52. 


What symbol is used to indicate the grand total of all the observations? 


You can compute all the necessary sums of squares, using only 2x and 2x? for 
the various groups. The formulas are as follows: 


Total sum of squares 


> 2 


Degrees of freedom for the total sum of squares depend on the total size of all 
samples combined: df = N — I. 


Between groups sum of squares 


(Fx Ex (Exp) 
es A Ge 
n, n, N 


Degrees of freedom for the between groups sum of squares depend on the 
number of groups. We use g to indicate the number of groups. 


df=g- | 
Within groups sum of squares 


total sum of squares — between groups sum of squares or 


2 2 
On) pyr Emk 


2 
2x, n 2 
| 


Degrees of freedom for the within groups sum of squares depend on the degrees 


of freedom for each of the individual groups. df = (nm, — 1) + (nm,— 1)+....A 
simpler way to calculate this is df = N — g. 


In the example above what value from the table corresponds to 2x,;? 


125 


7199 
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53. 


55. 


36. 


What value will correspond to (2x7)*? 


15625, 125° 


What value corresponds to 2x,?? 


257 


What value corresponds to (2x, )’? 


625, 25° 


Calculate the total sum of squares for the data in frame 46. Refer back to the 
formulas if you wish. 


2 
799 — a _ 799 — 710.2 = 88.8 
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37. 


58. 


59. 


Calculate the between groups sum of squares. 


mm ww we ee i= 8. ee 


(257 | (35 (IF | (24% (125) 


5 6 7 r 7 = 125 + 204.2 + 240.1 + 144-— 710.2 = 3.1. 


Calculate the within groups sum of squares. 


88.8 — 3.1 = 85.7 


Complete the following table: 


Sum of Squares df Variance Estimate 


Within groups 
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61. 


62. 


Sum of Squares df Variance Estimate 


What is the F ratio for these data? F = 


Let’s review the meaning of between groups and within groups variance. The 
between groups variance estimate is based on the means of the groups. The 
within groups variance estimate is based on the variances of the individual 
groups. Which variance estimate reflects only the chance variations involved in 
drawing a sample? 


Within groups. The between groups variance also reflects the intentional dif- 
ference between groups. 


Which variance estimate reflects the intentional differences between groups? 


Between groups 
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63. 


64. 


65. 


66. 


If the between groups variance is less than the within groups variance, you can 
say that the differences between groups are (large/small) compared with the 
chance variation involved in drawing a sample. 


Chance, the chance variations involved in drawing a sample. 


What does the between groups variance estimate reflect? 


The differences between groups 


If s,’ is smaller than s,°, can the results be significant? 


ee ei ee li eC =” 
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67. The following scores represent the performance on a trouble-shooting test of 
small random samples of people with different educational backgrounds. 
Does educational background appear to make a difference in the performance 
on this test? Use a significance level of 1%. 


Group | (high school) 1,3,4 
Group 2 (technical institute) 4,5, 6, 
Group 3 (college) pee oe 


Suggestions: 


(a) Set up a table for the data like the one in frame 46. 

(b) Compute the sums of squares and the variance estimates and summarize 
them in a table like the one in frame 59. 

(c) Calculate F and refer to the table to determine if it is significant. 


ee ea ae a ei ee C= 
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Group | Group 2 Group 3 Total 


ee ee ee sn Ue 


1 | 4 16 2 4 
3 9 § 2 3 9 Sx,= 65 
4 16 6 36 3 9 Sx? = 371 
6 36 4 16 N = 14 
7 49 
8 64 
9 81 


8 26 45 307 12 38 


hoa NST np a4 


Sum of Squares df Variance Estimate 


22.40 
2.22 


Critical region F = 7.20 


F= = 10.1 


The null hypothesis is rejected. Educational background does make adifference. 


68. Look at this printout of an analysis of variance calculated by computer. 


SOURCE DF SUM SQUARES FRATIO PROB 
BETWEEN GROUPS 5 361.3217 11.526 0.000 
WITHIN GROUPS 494 3097 .3463 


TOTAL 499 3458.6680 


Note the degrees of freedom; how many groups were used in the experiment? 
How many individuals were studied? 


Are the differences among groups significant at 
the 1% level? (Yes/No) | 


emma ee = 8 Oe ee ee Ce Cee Cee Ce CL Ce C= 


6 groups (df = 6 — 1) 
500 individuals (df = 500 — 6) 
Yes, the results are significant. 
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WHEN TO USE ANALYSIS OF VARIANCE 


Certain assumptions apply to the analysis of variance. Only when these assumptions 
are met is it appropriate to use analysis of variance to test your hypothesis. 


69. 


70. 


71. 


72. 


Analysis of variance and the ¢ test for the difference between two means are 
based on similar mathematical derivations. They depend on the same assump- 
tions and they yield the same results when applied to two samples. Both tests 
require what two assumptions? 


The population distributions are normal. 
The population variances are equal. 


If you know that the distribution of family incomes is strongly skewed, will it 
be appropriate to use analysis of variance to study the effect of level of 
education on family income? 


No. You cannot assume that the populations are normally distributed. 


Will it be appropriate to use a / test? 


No. The ¢ test also assumes normally distributed populations. 


If comparisons are to be made among several different samples, ¢ tests are not 
appropriate; for example, if you are comparing four sample means, six different 
two-way comparisons are possible. If you select a significance level of 5%, each 
individual comparison has a 5% chance of a Type I| error. For all six compari- 
sons you have substantially more than a 5% chance of at least one Type | 


error. The analysis of variance avoids this problem because it makes only one 


comparison (between s,’ and s,’). Therefore, when you are making com- 
parisons among a number of samples, the appropriate test to use 1s (a ¢ test/ 
analysis of variance). 


analysis of variance 
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73. The following is a brief summary of the adjustment scores of mental patients 
treated with three different drugs. Is analysis of variance an appropriate statis- 
tical technique to apply in this case? 


Group | Group 2 Group 3 

Ye 5 x = 10 x= 4 

s= 4 S23 cS) 

n= 10 n= 33 n= 45 
No 


74. What assumption is violated by the above data? 


The variances are very unequal (s,7= 16, s,7=529,s,7=25). In this case, you 


might be able to restructure your data and apply the Chi square test discussed 
in Chapter 8. 


REVIEW PROBLEMS 


If you have successfully completed this chapter you can now use the F distribution 
to perform statistical tests. You can: 


@ test the significance of the difference between two sample variances; 

@ perform an analysis of variance to determine if the differences among a number 
of sample means are statistically significant; 

@ recognize situations where F tests should not be used because the data does not 
meet the assumptions of the mathematical model. 


Now try these review problems. Table I on pp. 253-256 lists any formulas you 
may need for reference. 


1. A number of students are assigned randomly to three classes with three different 
teaching methods. The following statistics summarize the performance of the 
three groups on a comprehensive final exam. Can you perform an analysis of 
variance with these data? What assumptions are involved? 


Group | Group 2 Group 3 
= 10 


n n= 11 n= 8 
x = 89 xX-= 1D x = 90 
s* = 100 sx? = 81 s = 64 
5 10 s= 9 SS 8 
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2. The situation is the same as in problem | but the data are different. Can you 
perform an analysis of variance? What assumptions are involved? 


Group | Group 2 Group 3 


n= 10 Wee 1) n= 8 
x = 90 x = 86 x = 70 
s? = 144 S25 So 6 
Ge 12 f=. >) sa 4 


3. You believe that a new production procedure will reduce the variability in size 
of molded plastic parts. You obtain samples of parts molded under the new and 
old procedures and measure them as summarized below. Outline an appropriate 
Statistical test at the 5% significance level. Is the difference between the vari- 
ability of the two samples significant? Is the difference between their means 


significant? 
Group | Group 2 
n=15 n= 25 
x = 20.00 x = 21.00 
F&F = 0.0625 sx = 1,00 
s 0.25 s = 1.00 


4. Perform an analysis of variance of the following data. Use a 5% significance level. 


Group | Group 2 Group 3 
| 2 3 
2 3 4 
2 4 4 
2 4 5 
3 5 
6 
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Answers 
To review a problem, study the frames indicated after the answer. 


1. Yes, an analysis of variance is possible. You must be prepared to assume that 
the populations from which you are sampling are normally distributed. To 
interpret the results you must also be prepared to assume that the teaching 
method is the only reasonable explanation of the differences between groups. 
See frames 69 to 74 and frames 26 to 31 in Chapter 4. 

2. No, you cannot perform an analysis of variance. One of the assumptions for 
analysis of variance is that the populations sampled have equal variances. The 
variances of these three samples are too different to allow this assumption. The 
variance of group 1 is nine times as great as that of group 3 and almost six 
times as great as that of group 2. See frames I to 23 and 69 to 74. 

3. Null hypothesis O, = 0, 


Alternative a, > 0, 

Significance level a = 0.05 

Critical region F = 2.35 (df = 24, 14) 
_sP 100 | 

=r 0625 


The difference in variances is significant. With this substantial difference in 
variances, the accuracy of a f¢ test would be in doubt. See frames 1-23. 
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l | 2 4 3 

2 4 3 9 4 16 Sx, = SO 

2 4 4 16 4 16 E£x,? = 194 

2 4 4 16 5 25 N = 15 

3 9 5 25 

6 

10 22 24 106 16 66 

n=5S n= nN, = 
Total sum of squares 

yg (EXE) (50) 19 2500 

2 a) So = ee _— SO —_ = 
aX; N 194 5 4 5 194 — 166.7 = 27.3 


df=N—-1=14 


Between groups sum of squares 


2 2 ‘ ‘ 2 ‘ y 
RO og EI. og MEN ES OE ge ME ge SIO. TO 
n, n, n; N 5 6 4 


+ 96 + 64 — 166.7 = 13.3 


df=g-—1=2 

Within groups sum of squares 

total — between = 27.3 — 13.3 = 14.0 
df= N-—g= 12 


Sumof Squares df Variance Estimate F 


The critical region is F => 3.89 for a 5% significance level. The difference between 
groups is significant. See frames 24 to 67. 


CHAPTER SEVEN 
The Relation Between Two 


Sets of Measures 


So far, we have been concerned with only one kind of data at a time. In this 
chapter you will learn some ways of studying the relationships between two 
different measures. For example, you may wonder if the height of a tree is related 
to its age, or if wealth and happiness go together. A scattergram can be used to 
display graphically the relation between two different measures in a sample. It is 
also possible to summarize the relation between two measures quantitatively, using 
a correlation coefficient. And, given certain assumptions, it is possible to test the 
significance of a correlation coefficient by referring to a sampling distribution. 

If two measures are related, it is possible to use one measure for an individual to 
predict the other. This is done by means of a regression equation. For example, if 
wealth and happiness are related, you might be able to apply a regression equation 
to a person’s happiness rating to determine his or her approximate wealth. 

When you have completed this chapter, you will be able to: 


construct and interpret a scattergram; 

compute and interpret the correlation coefficient r; 

test the null hypothesis that two measures have zero correlations; 

use a regression equation to predict the value of one measure on the basis of a 
related measure. 
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SCATTERGRAMS 


Often we are interested in the relationship between two different measurements or 
observations in a population; for example, do high scores on a scholastic aptitude 
test tend to go with high grade point averages or are the two measures unrelated? Do 
rich families tend to have many children or few children, or is there no relation at all 
between wealth and the number of children in a family? One simple way to examine 
such a relationship 1s to draw a scattergram. 


1. Look at the scattergram below. Each x on the scattergram represents a family 
in our sample. The height of the x on the graph represents the number of 
children in the family. The left-right position of the x represents the family 
income; for example, A represents a family with three children and an income 


of $8000. 
B represents a family with ________ children and an income of 
Cc 
= 
3 
= 
i=) 
6 
® 
ae) 
E 
] 
az 
oS © 
s 8§ 8 8 8 8 8 8 
2 0 =) N + © ra) oO 
- sa - ~ _ —N 
Income 
two 
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2. In the scattergram below circle the x that represents a family with no children 
and an income of $6000. 


ran) =) ran) =) S r=) =) ro) 
ras) ran) Q OS =) ra) =) 
oS ra) =) oO oS any 5 So 
Te) co io) N t+ © co o 
5 x 4 
4 x x x x 
3 x x x 


2 2 
a =) 
S S 
Xe) 00 


10,000 
12,000 
14,000 
16,000 
18,000 


3. Looking at the scattergram, you can say that in this sample families with high 
incomes tend to have relatively (larger /smaller) numbers of children. 


smaller 
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4. Construct a scattergram for the following data: 


Student No. Midterm Exam Grade Final Exam Grade 


74 
73 
71 


65 


68 


7\ 


80 
83 


a, 


75 


85 


85 


90 
94 
99 
98 


88 
95 


97 


100 


10 
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Your scattergram should look like the one below. 


Final 


Midterm 


You might have reversed the axes so that the vertical dimension represented 
the midterm grade and the horizontal dimension, the final grade. 


When one measure may be used to predict another, it is customary to represent the 
predictor on the horizontal dimension (the x axis). 


In this case high scores on the midterm tend to go with (high/low) scores on the 
final exam. | 


6. 


Children 
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In each of the two scattergrams you have seen you could make a reasonably 
good representation of the relationship between the two measurements by 
drawing a straight line. This kind of relationship 1s called a linear relationship, 
as shown below. 


Final 


Income Midterm 
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Which of the scattergrams below shows a clear linear relationship? 


(a) (b) 


7. Sometimes a relationship between two measurements is curvilinear; that is, it 
is best described by a curved line. Which of the above scattergrams represents 


a curvilinear relationship? 


(b) 


8. A relationship that can be described by a straight lineis calleda________re- 
lationship. 
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linear 


9. Arelationshipthatis best described by acurved line is calleda 
relationship. 


curvilinear 


10. When there is no relationship between two variables, the scattergram will look 
something like this. 


The stronger the relationship, the more closely the points on your scattergram 
will approach some linear or curvilinear pattern. Which of the scattergrams 
below represents the stronger relationship between two variables? 


(a) (b) 
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11. Which of these two scattergrams represents the stronger relationship between 
two variables? 
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CORRELATION COEFFICIENTS 


If two measures have a linear relationship, it 1s possible to describe how strong the 
relationship is by means of a statistic called a correlation coefficient. The symbol for 
the correlation coefficient is r. The symbol for the corresponding population 
parameter is p (the Greek letter “rho’’). 


There are various forms of correlation coefficient for specialized types of data. We 
explain only the basic type, which is called the Pearson product-moment correlation. 


12. Match the explanations of correlation coefficients to the appropriate illustra- 
tions below. : 


(b) (c) 


Correlation 

Coefficient Figure 

r= +1.0 All scores are exactly on the line. High scores on one 
measure go with high scores on the other. Soi 

r between High scores on one measure tend to go with high scores 


+1.0and0 — on the other but the relationship is not perfect. 
(e.g., +0.5) a 


p=30 No relationship at all between the two measures. ae 
r between 0 High scores on one measure tend to go with lowscores | 

and — 1.0 on the other but the relationship is not perfect. 

(e.g., —0.3) ee 
r= -1.0 All scores are exactly on the line. High scores on one 


measure go with low scores on the other. —s 
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(d) 
(c) 
(a) 
(e) 
(b) 


The basic formulas for the correlation coefficient are 


Oy Oy 


p= bya md Om) 


| yo (vy — y) 


S 


n— | : 


Let us apply this to a simple example. Suppose you are interested in the relation | 
between the age of garden snakes (in months) and their length (in inches). You 
obtain a sample of three garden snakes of known age and you measure them 
with the following results: 


Snake No. Age (x) Length (y) 


] | 4 
2 2 6 
3 3 8 
a y = 6 
Sus | Sy = 2 


isr= 
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14. 


15. 


For each snake we must compute (x — X)/s, - (vy — y)/s,. Note carefully that 
(x — X)/s, tells us how many standard deviations a given snake’s age 1s away 
from the mean. It is a form of z score. What does (y — y)/s, tell us? 


How miuny standard deviations the snake’s length is away from the mean. 


Complete this table to find (x — X)/s,- (vy — y)/s, for each snake. 


Age x —X Length y-y (x —X) V-Y) 
ve (x-X) Sy y (y — y) Si Sy Sy 
| 4 
2 6 
> 8 
X¥=2 y=6 
ee Sy = 2 
Age x — x Length (y — y) (x — xX) WY) 
KM AR Se y (vy — ¥) Sy Sy Sy 
| —] —]| 4 —2 — |] + | 
pi 0 0 6 0 0 0 
3 + | + | 8 +2 + | + | 
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16. For this sample what is 2(x — X)/s, + (vy — ¥)/s,? Add up the sum in your table. 
+1+0O0+ 1= +2 
17. Compute r. 
| (x- xX) W-Yy)_ 
iy SO ae 
r : 2= +1.0 
3-4] 7 
18. A correlation coefficient of + 1.0 means that for every pair of observations 
(x — X)/s, is exactly equal to (vy — y)/s,. When a snake’s length is two standard 
deviations above the mean, its age 1s 
Two standard deviations above the mean 
19. Ifthe correlation coefficient is + 1.0, information about one measurement tells 


you precisely what the other measurement must be; for example, suppose that 
instead of three garden snakes, you have a large population. For age the mean Is 
12 months, with a standard deviation of 4 months. For length the mean is 25 in., 
with a standard deviation of 8 in. The correlation between age and length is 
p = +1.0. You are going to pick one snake at random from this population. Can 
you predict accurately what its age will be? 
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20. 


21. 


22. 


23. 


No. Your best guess would be 12 months, but the odds are small that you will 
be precisely correct. 


Can you predict accurately what its length will be? 


You select a snake and determine its age. Can you predict accurately what its 
length will be? 


Yes. If you know how many standard deviations above or below the mean its 
age is, you know that its length must be the same number of standard deviations 
above or below the mean. 


If a correlation coefficient is — 1.0, you can also predict one measure precisely 
by knowing the other. When r = — 1.0, (x — X)/s, = —(y — y)/s,. An x score 
two standard deviations above the mean must be paired with a y score two 
standard deviations (above/below) the mean. 


You are studying reaction time in middle-aged drivers. The correlation between 
age and reaction time is —0.96. If you know the age of a driver from this 
population, can you state precisely what his reaction time will be? 


No. The relationship is strong but not perfect. 
On the basis of this correlation coefficient, describe in general terms how age 


is related to reaction time in the population. 


Higher ages tend to go with longer reaction times. 
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25% 


26. 


2d, 


28. 


29. 


If you know a driver’s age, can you make a more accurate prediction of his 
reaction time than if his age were unknown? 


Yes 


In the case of the garden snakes we can say that the age of the snakes explains 
all of the variance in their lengths; that is, knowing the ages, we can predict 
their lengths precisely. In the case of the drivers does age account for allofthe 
variance in their reaction times? 


No 


The square of the correlation coefficient tells us exactly what percentage of the 
variance of y 1s explained by x; for example, the percentage of the variance in 
reaction times explained by differences in age in our group of drivers is r 


OF 


92% 


If the correlation between aptitude-test scores and grade-point averages ina 
population of students is 0.70, the aptitude-test scores explain % of the 
variance in academic performance. 


40"% 


When we say that 49% of the variance in academic performance is “explained” 
by aptitude-test scores, it is important to remember that we are not necessarily 
describing a cause and effect relationship; we are only describing how one 
variable can be used to predict another; for example, could you use academic 
performance (grade-point averages) to predict aptitude-test scores? 


Yes 


CORRELATION COEFFICIENTS — 197 


30. 


31. 


The correlation between grade-point averages and aptitude-test scores is + 0.70. 


Academic performance therefore ‘‘explains” 7? = 49% of the variance in 


aptitude-test scores 


A researcher is able to determine the total lifetime earnings and the age at 
which they died for an appropriate sample of working men. The correlation 
between total lifetime earnings and life-span 1s +0.80. The researcher con- 
cludes that poverty is a cause of premature death. Does his statistical 
information support this conclusion? Why? 


mm i ii ei ee ei 


No. The correlation coefficient does not tell which measurement is cause and 
which effect, or if some other factor underlies both of the measured items. 
Early death probably reduces total lifetime earnings. Health problems might 
cause both early death and low earnings. All the researcher can conclude 
from the correlation is that early death and low lifetime earnings go together. 
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COMPUTING r 


For most practical purposes, correlation coefficients are calculated by computers. 
Nevertheless, at least once in your life you ought to compute one by hand so that 
you can check the honesty of your computer. 


32. It is not necessary to compute (x — X)/s,, (vy — Y)/s, for every pair of scores 
to compute r. In practice, less computation is required if you use the following 
formula: 


nZ (xy) — (2x)(Zy) 
[n=x? — (Xx) ] [nzy — (Zy)] 


Courage! The formula is not so difficult as it looks. You can set up a table 
similar to those used for computing standard deviations or for analysis of 
variance. Use the information from the following table to complete the 
computation of r. (If you have a calculator, this is a good place to use it.) 


x x yo yp xy 
| 2 4 2 
2 4 4 16 8 
4 16 5 25 20 
5 25 7 49 35 
5 25 8 64 40 
17 71 26 158 105 
§(105) — (17)(26) - 4097 


" Test7iy — (177) 150158) — 26) 
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33. Set up a table and compute r for the following data: 


x yp 
5S 4 
6 3 
1 2 
4 6 
2 3 


meee es 


5 2 4 16 2 
6 3% 3 9 18 
i i 2 #& 2 
4 16 6 36 24 
2 4 3 9 6 
18 82 18 74 70 
_ 5(70) — (18)(18) ee er 


V{5(82) — (18)°] ([5(74) — (18)'] [86] [46] 
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HYPOTHESIS TESTING 


As with other statistics, it 1s possible to draw conclusions about the correlations 
within a population on the basis of a sample by referring to a sampling distribution. 
That is, you can test the significance level of a correlation coefficient. 


34. 


35. 


If you look at pairs of observations in a small sample, you are quite likely to 
find correlations by chance; for example, in the space below write the last seven 
digits of your social security number in column a and write your telephone 
number in column b. Now compute the correlation between the two sets of 
observations. Refer to the formulas on page 253 if you wish or use 


n= (xy) — (Zx)(Zy) 


JV[ndx? — (Ex¥] [nEy — Sy¥] 


r= 


a (Social Security No.) 6 (Telephone No.) 
x 5 ea y y xy 
rx = Ix? = ty= ry= <=Ixy= 


You almost certainly obtained some value of r other than 0. The probability that 
it was somewhere between +0.75 and —0.75 is 95%. 


As you no doubt suspect, we can construct theoretical sampling distributions 
for r under certain assumptions. Specifically, if we assume that we are drawing 
random samples from a population in which x and y are both normally distrib- 
uted and have zero correlation, we can compute the probability of obtaining 
various values of r. This information is presented in the table on page 275. What 
is the title of the table? 


Critical values of r. 


36. 


37. 


38. 


39. 


40. 
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We can use this table to test the null hypothesis that the population correla- 
tion p is zero; for example, a researcher believes that there is a correlation 
between the blood pressure and the amount of a certain chemical in the 
blood. She knows that both measures are normally distributed. From a 
random sample of 27 individuals she computes a correlation coefficient. What 
is the null hypothesis? 


Null hypothesis er 


i a a 


What 1s the alternative? 


In establishing a critical region, she will consider (both ends/one end) of the 
sampling distribution. 


me a em en ee ee es a Ce, “ee 


both ends 


Let us use a 5% significance level. Since both ends of the sampling distribution 
are being considered, we must find the critical value of r that corresponds to 
0.025 for our sample size. What 1s the critical value? 


r> + 0.396o0rr < — 0.396 
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41. If the researcher obtains a value from his sample of r= —0.29, can she reject 
the null hypothesis? 


42. Work through one more example. You believe that the speed in words per 
minute and error rate in errors per page of typists are negatively correlated; 
that is, fast typists tend to make (more/fewer) errors than slow ones. 


meee i eee ee lee 


fewer 
43. You will test this theory with a sample of 20 typists. Outline an appropriate 
statistical test of the theory, using a significance level of 1%. 
Null hypothesis 
Alternative 
Significance level 


Critical region 


Null hypothesis p 
Alternative p <0 
Significance level a= 01 

Critical region r < — 516 


44. Using the table of critical values of r involves two assumptions. 
(a) The two measures are distributed 


(b) The sample is chosen at 


normally 
random 


45. To test your theory about the relation between typing speed and error rate you 
choose five very fast typists, five moderately fast typists, five moderately slow 
typists, and five very slow typists for yoursample. Does this procedure meet the 
assumption of your statistical test? Why? 
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47. 


No. This is not a random sample because not all typists have an equal chance of 
being selected. You cannot use the table of critical values of r, but you can use 
other statistical procedures as you will see in a moment. 


It is not uncommon for researchers doing exploratory studies to collect data on 
a large number of variables and use a computer program to calculate all 
possible correlation coefficients in a search for clues to meaningful relation- 
ships. Look at the following computer printout. 


PEARSON CORRELATION COEFFICIENTS 


VARIABLE VARIABLE 

PAIR PAIR 

FOODCOST 2598 | FOODCOST 5798 
WITH N 45. WITH ON 45 
RENTCOST P 042 TRNSCOST P .0000 
VARIABLE VARIABLE 

PAIR PAIR 

FOODCOST 5469 RENTCOST —.0111 
WITH N 44 WITH N 45 
PAYLVL p 000 TRNSCOST' P A71 
VARIABLE VARIABLE 

PAIR PAIR 

RENTCOST ~.0249 TRNSCOST 6858 
WITH N 44. WITH 44 
PAYLVL P 436 PAYLVL 049 


Which correlations are significant at the 1% level? 


FOODCOST WITH TRNSCOST and 
FOODCOST WITH PAYLVL 


Suppose you collect data from a sample of voters on a large number of 
variables—age, years of education, income, etc. You use a computer to 
calculate correlation coefficients among 12 variables (66 correlation coeffi- 
cients). You find that three of these are significant at the 5% level. Do your 
results support the theory that these three pairs of variables are correlated in 
the population from which the sample was drawn? Why? 
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ei a ae 


No. When you calculate 66 correlation coefficients you must expect 5% of them 
(3.3 on the average) to be significant at the 5% level by chance. 


PREDICTION 


Often we wish to use one measure to predict another measure; for example, we may 
use aptitude tests to predict academic performance or job performance. We may use 
advertising volume to predict the sales of a product. We may use rainfall to predict 
the growth rate of crops. The technique for developing such predictions is called 
regression analysis. 


48. Consider the scattergram below. It represents the relation between a test of 
finger dexterity and productivity on an assembly job. 


400 }— 


300 


Productivity 


200 


100 


Test score 


The diagonal line is the straight line that best represents the relationship be- 
tween the two measures. You can use this line to obtain the best prediction of 
productivity for any given test score; for example, if a person scores 3.75 on 
the test, our best prediction of his productivity is 150 units per day. If a 


person scores 8.0, our best prediction of his productivity is 


400 units per day 
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49. The line on the scattergram is called the regression line of yonx or, in this case, 
PNG ot Of 


regression line 
productivity 
test scores 


50. The regression line of y on x indicates the best prediction of y for every value 
of x on the basis of the sample used. If there is no relationship between the two 
measures, the regression line does not add to your knowledge. If there is no 
relationship between y and x in your sample (if r = 0), the best prediction of 
y is always y, no matter what value x has; for example, try to draw in the regres- 
sion line of y on x in the scattergram below. You want to draw a line that 
shows that y 1s the best prediction of y for every value of x. 
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51. 


x Regression 
line of 


yon x 


If you use the regression line illustrated above to predict y, your prediction 
will always be y, no matter what value of x you start from. 


When we use a regression line, we assume that the relationship between the 
two measures is linear. Would a straight regression line give good predictions if 
the relation between the two measures were strongly curvilinear? 


ee 


No 
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52. 


53. 


It is important to draw a scattergram to make sure that the relationship 
between the two measures is linear, but it is not necessary to read from the 
scattergram to make predictions. 


The mathematical equivalent of reading the scattergram is accomplished by 
using the following formula: 


y=y + d(x -X) 
_ Sy n&(xy) — (2x)(zy) 
=e Sx = n=x? — (Lx 


This formula gives the best possible prediction of y for any given value of x. 
To use the formula you must find the values of x, y, and 6 on the basis of 
your sample data and then plug in whatever value of x you are interested in. 


Let us apply this procedure. The following data represents advertising expendi- 
tures (in thousands of dollars) and sales (in tens of thousands of dollars) fora 
department store. Your problem is to predict the sales that will result from 
spending $1600 on advertising. 


Advertising Expense Sales 
(x $1000) (x $10,000) 
5 4 
6 3 
l Z 
4 6 
2 3 


These numbers are the same as those uSed in frame 33, so you can save yourself 
some arithmetic by looking back at your computations for that frame. First, 
what are x and y? 


Now compute b. Refer to the formula above and to your work on frame 33. 
You have already computed most of the necessary terms. 


— ae 
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55. 


—— ww ee ee ee 8 


5 — MECay) = (Exi(Ey) _ 5(70) = (18)(18) _ 9 ao 


nZ=x? — (Zx/ 5(82) — (18) 


You want to predict the sales resulting from $1600 of advertising. What value 
of x will you use? (Remember the data is given in thousands of dollars.) 


Use the regression equation y = y + b(x — X) to predict the value of y. 


yr 


y=) + b(x — ¥) = 3.6 + 0.302(1.6 — 3.6) = 3.6 — 0.604 = 3.0, or $30,000 


On the basis of the same sample, what sales would you predict for $4600 of 
advertising? 


y=y + d(x — X) = 3.6 + 0.302(4.6 — 3.6) = 3.6 + 0.302 = 3.9 or $39,000 
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57. 


When you use a regression equation to predict values of a measure, you are 
subject to the danger that the relationship between the measures in your sample 


is entirely or partly due to chance. By making certain complex assumptions 


about the distribution of y it 1s possible to establish confidence intervals for 
your predictions. These procedures are beyond the scope of this book. Another 
useful check on regression equations is cross validation. You can use the re- 
gression equation developed with one sample to predict values of y for a new 
sample and then check the accuracy of the predictions; for example, a re- 
searcher is developing a method based on an auditory discrimination test, of 
predicting success in learning foreign languages. On the basis of his sample 
he computes 


x = 10 ¥ = 50 b = 5.0 
(discrimination (language test) 
test) 


Write the appropriate regression equation for predicting y. 


VS 


y=) + b(x — X) = 50 + 5.0(x — 10) 


To cross-validate his regression equation he will use (a new sample/the same 


sample). 


a new sample 


210 THE RELATION BETWEEN TWO SETS OF MEASURES 


59. In his new sample the researcher obtains discrimination test scores of 5, 6, 8, 


12, and 15. What language learning scores will he predict? Apply his regression 
equation. 


x v(predicted) 


xy (predicted) 


5 25 
6 30 
8 40 
2 60 
> 75 


60. The actual scores he finds are as follows: 


x (predicted)  » (actual) 
> 25 23 
6 30 35 
8 40 4| 

12 60 58 

15 75 75 


His predictions appear to be relatively (accurate/inaccurate). 


accurate 
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61. 


Describe cross-validation briefly in your own words. 


Your answer should have included these points: 


(a) Obtain new sample. 
(b) Use regression equation to predict scores. 
(c) Compare predicted scores with actual scores. 


REVIEW PROBLEMS 


If you have successfully completed this chapter, you can now statistically describe 
and test the relationship between two measures. You can: 


draw a Scattergram; 

compute the correlation coefficient and test the null hypothesis that r = 0; 

use a regression equation to predict one measure on the basis of another, 
correlated measure. 


Now try these review problems. Table I on pp. 253-256 lists any formulas you 


may need for reference. 


Draw a scattergram for the following data. 


by ¢ Syl Os OI 
l 


X 
Vi Nh 53Ay 35 sth 
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? 


owe 


Describe the relation between x and y in the data for problem |. Is a correlation 
coefficient an appropriate measure of the strength of the relationship? 


3. Compute r for the following data: 


& 2, 323045 52526 
vy 7,6, 5,4, 3, 2, 1 


4. For the data in problem 3 outline an appropriate statistical test for the null 


hypothesis that there is no correlation between the populations of x. and} scores. 
Use a 1% significance level. Is the r obtained in problem 3 significant? 
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5. Use the data for problem 3 to establish a regression equation to predict y. If x = 5, 
what is your best prediction for y? 


Answers 


To review a problem, study the frames indicated after the answer. 


io) 


~- YNYOwW FO DnN DW wO 


123 4 5 6 7 8 9 10 11 
x 


See frames I to 4. 
2. The relationship is curvilinear; therefore r is not an appropriate measure of the 


strength of the relationship. See frames 6 to 11. 
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2 4 7 49 14 
3 9 6 36—sd18 
3 9 5 25 15 
4 16 4 16 = 16 
Ss 25 3 9 15 
5 25 2 4 10 
6 36 | 66 
28 124 38 140 94 
7 n&(xv) — (Xx)(Ly) 7 7(94) — (28)(28) 
V[ nix? — (fx) [nde — (Syy] [7(124) — (784)] [7(140) — (784) ] 
pp NOY ce NE a Og 
VV 16464 128 
See frames 32 to 33. 
4. Null hypothesis p =0 
Alternative p #0 
Significance level a = 0.01 
Critical region r 2 +0875 orr < —0.875 


Since r = — 0.98, the null hypothesis is rejected. See frames 34 to 45. 


y=yprtb(x—-xX)=4- 1.5(x« —- 4) 
y= 4- 1.5(5 — 4) = 2.5 if x = S. 


See frames 48 to 59. 


CHAPTER EIGHT 
A Test of Distributions 


Many kinds of data come in the form of counts or categories. For example think of 
the sets at the start of Chapter 1, “blue eyes, brown eyes, green eyes...” or “station 
wagon, sports car...”. Using a sampling distribution called chi square (x7), it is 
possible to determine the probability that any given sample was drawn from a 
population with a given distribution. 

In using chi square for formal hypothesis testing, the important thing is to be 
able to come up with an appropriate population model for your null hypothesis. 
With a little ingenuity, it is possible to use chi square tests in many cases where 
other kinds of statistical tests are of doubtful applicability, for example when 
measurements do not have normal distributions or equal variances. 

When you have completed this chapter, you will be able to: 


@ perform a x? test with data on a single variable; 
@ perform x? tests for data on two variables; 
@ use x? tests as a substitute for analysis of variance or ¢ tests when appropriate. 


THE CHI SQUARE TEST OF A DISTRIBUTION 


The x” distribution (Greek letter chi) is a theoretical sampling distribution that 
allows you to test the assumption that a sample was drawn from a population with a 
given distribution. It allows you to compare a sample distribution with a population 
distribution derived from a theory or null hypothesis and decide whether the sample 
could reasonably be a random sample from that population. 


1. On the basis of genetic theory you predict that a particular population of 
guinea pigs should be 40% brown, 40% spotted, and 20% white. What distribu- 
tion of colors would you expect in a sample of 50 guinea pigs? Complete the 
table below: 

Brown Spotted White 


Predicted number 
of guinea pigs 


Brown Spotted White 


Predicted number 
of guinea pigs 20. 20 10 
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2. 


In a sample of 50 guinea pigs you find 30 brown, 15 spotted, and 5 white. This 

observation is compared with your theoretical prediction in the following table: 
Brown Spotted White 

Predicted 20 20 10 

Observed 30 15 5 


Can the differences from what you predicted be accounted for by chance or 
will you have to reject the theory on which your predictions are based? To 
find out you can compute a value of x’. 


The formula for x? 1s 


ye Berd 


In this formula F is the predicted frequency for a given category (or “‘cell’’) and 
is the observed frequency. For our example we compute as follows: 


| a ta © eee 2 Suen © ake 2 Uf — F)'/F 


30 20 »3+10 100 5 
15 2 £—5 25 1.25 
5 10 £4—5 25 2.50 
wan 2 ee 

xv aa Y= 375 


What is the value of y?? 


If the differences between the theoretically predicted frequencies (F) and the 
observed frequencies (/) tend to be large, y? will be (large/small). 
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What would a small value of y* indicate? 


em mei wee ee 


The observed frequencies tend to be close to the theoretical predictions; ( f —F)’ 
tends to be small. 


Look at the formula for y’. Is a negative value of x’ possible? 


No, ( f —F) is always positive, so x” is always positive. 


To find the probability of obtaining by chance a value of y’ as large as 8.75 you 
can refer to the table on page 276. Look at the table now. What additional 
information do you need to use the table? 


You need to know the df, the degrees of freedom. 


In this case the degrees of freedom for x? are the number of categories minus |. 
What degrees of freedom will you use for this problem? 
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8. What is the probability of an y’ as large as 8.75? 


Less than 0.025 (between 0.025 and 0.01) 


9. You believe that people are most likely to attempt suicide on a weekend. To 
test this theory, you have obtained reports of attempted suicides from the 
police. In the past two years, 147 attempts were reported, distributed as 


follows: 

Day Sun. Mon. Tues. Wed. Thurs. Fn. Sat. Total 
Number of 

attempts 32 10 13 13 4 40 35 147 


An appropriate null hypothesis for a statistical test would be that equal 
numbers of attempts are made on all days of the week. Use this theory to 
compute predicted frequencies for the following table. Divide the total equally 
among the groups. 


Sun. Mon. — Tues. Wed. Thurs. Fri. Sat. 


Predicted 


emma es 


Sun. Mon. Tues. Wed. Thurs. Fri. Sat. 
Predicted 21 21 21 21 21 21 21 


(Since there are seven categories, the predicted number for each category 
would be | /7 of the total.) 


10. 


Il. 


12. 
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What are the degrees of freedom for y’ in this case? 


Compute y’. Can you reject the null hypothesis? 
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13. 


fe Oa aE CEP 


32-21 1 121 5.8 
10 21-1 121 5.8 
is 2 eR 64 3.0 
13 21 -—8 64 3.0 
a) rs 289 13.8 
40 21 19 361 17.2 
35 I 14 196 9.3 

57.9 
x? = 57.9 


The null hypothesis can be rejected. 


There is one important fact to note about this statistical test. The order of the 
categories has no effect on the value of y?. Only the amount of the differences 
matters. As a result, you would have obtained the same value of y? if your 
distribution had been the following: 


Day Sun. Mon. Tues. Wed. Thurs. Fri. Sat. 
Number | 
of reports 13 13 35 40 32 10 4 


Would these results have supported your theory? Why? 


No. These data indicate that suicide attempts are made in mid-week. 


14. 
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When you use a y~ test to reject a null hypothesis, you must always look back 
at the data to make sure that they support your alternative. You have takena 
census of the insect population of your rose garden. On the basis of several large 
samples you conclude that the insect population is distributed as follows: 


Ladybugs 20% 
Inch worms 20% 
Weevils 30% 
Aphids 2% 


Brown spiders 5% 


Now you treat your garden with an insecticide that is supposed to control the 
undesirable weevils, aphids, and brown spiders without affecting ladybugs or 
inch worms. 


To check the effect of the insecticide you collect 150 insects at random. Your 
sample is composed as follows: 


Ladybugs 25 
Inch worms 45 
Weevils 45 
Aphids 25 


Brown spiders 10 


Use the census to determine the predicted frequencies. Compute \” and answer 
these two questions: | 


|. Has the distribution of insects changed significantly (at the 5° level)? 
2. Has the insecticide had the intended effect? 
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f F Of-F) (- Fy - FF. 


25 30 5 25 83 
45 30 15 225 7.50 
45 45 0) 0 0) 
25 37.5 12.5 156.25 4.17 
10. 7.5 2.5 6.25 0.83 
13.33 
X’ = 13,33 
df =4 


Critical region y? = 9.49 


(a) The distribution of insects has changed significantly. 

(b) The insecticide does not have the intended effect. In theory the propor- 
tions of ladybugs and inch worms should increase as the proportions of 
weevils, aphids, and brown spiders decrease. Instead, weevils were un- 
affected, ladybugs decreased, and brown spiders increased. Only inch 
worms and aphids changed in the predicted direction. 


CHI SQUARE TEST WITH TWO VARIABLES 


Chi square can also be used to test hypotheses about distributions based on two 
variables. To do so we must establish a set of categories based on the two variables 
and predict the frequency of observations in each category. 


15. A researcher believes that people tend to choose mates with the same hair 
color. He can test this theory by constructing a distribution based on two 
variables: the husband’s hair color and the wife’s hair color. His table will look 


like this: 
Wife Husband 
Red Blond Black Brown 
Red 
Blonde 
Black 
Brown 


How many cells or categories are in this table? 


16. Each possible combination of husband and wife is a category. 


16. 


17. 


18. 
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The researcher’s first problem is to calculate a predicted frequency for each 
cell. To do this he must consider the numbers of men and women in each 
category. He has observed, let us say, 500 couples. The numbers of men and 
women in each category are indicated on the edges of the following table. In 
addition, the percentage of men in each category is listed at the bottom for 
reasons you will understand shortly. 


Wife Husband 

Red Blond Black Brown Total 
Red | 50 
Blonde 150 
Black 150 
Brown 150 
Total 50 100 150 200 500 
yA 10% 20% 30% 40% 100% 


The totals along the edge of the table are called marginals. Look at the marginals 
to find out how many of the wives are blondes. 


150 


Forty percent of all the husbands have brown hair. If there is no relation 
between the husband’s hair color and the wife’s hair color, approximately 40% 
of the women in each category should have husbands with brown hair; for 
example, of the 150 blondes, how many should have husbands with brown hair? 


0.40 x 150 = 60 
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19. Ten percent of all husbands have red hair. How many of the 50 red-headed 
wives Should have red-headed husbands? 


0.10 x SO= § 


20. How many of the brown-haired wives should have blond husbands? 


0.20 x 150 = 30 


21. Apply this method to fill in predicted frequencies in the table. 


Wife | Husband 
Red Blond Black Brown Total ye 

Red — a f= f= 50 10% 
Blonde F = F= F = F= 150 =. 30% 
Black = — = -— 150 30% 
Brown F= = = F= 150 =. 30% 
Total 50 100 150 200 500 
o/ 

5 10 15 20 

15 30 45 60 

15 30 45 60 

15 30 45 60 


22. Degrees of freedom in this case depend on the number of categories in each 
direction. They are (c — 1)(r — 1), the number of columns minus one times the 
number of rows minus one. Columns are the categories of husbands. How many 
columns are there in this problem? 
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23; 


24. 


The rows in this problem are the categories for 


(4—1)(4-1)=9 
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25. We have added the observed frequencies to the table. Compute X and answer 
two questions. 


(a) Is y? significant at the 1% level? 
(b) Does the data support the researcher’s theory that people tend to select 
mates with the same hair color? 


Wife Husband 


Red Blond Black Brown Total 

Red F 5 10 15 20 50 
f 10 10 10 20 

Blonde F 15 30 | 45 60 150 
i 10 40 50 SO 

Black F 15 30 45 60 150 
f 13 25 60 $2 

Brown F 15 30 45 60 150 
J 17 25 30 78 


Total 50 100 150 200 500 


26. 
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fF f=) f= Fy Uf — FY/F 
10 305 5 25 5.00 
105 =. 25 1.67 
13.15 =) 4 27 
1715 2 4 27 
10 ~—-:10 0 0 0 
40 30 10 100 3.33 
25 30 —5 25 83 
25 30 “6 25 83 
1015 a5 25 1.67 
5045 : 25 56 
60 45 15 225 5.00 
30.45 —15 225 5.00 
2 ~=—« 20 0 0 0 
5060 —10 100 1.67 
52-60 ~8 64 1.07 
78 60 18 324 5.40 
x? = 32.57 


(a) x? Is significant; df = 9. The critical region is y? = 21.67. 

(b) The data supports the theory because the observed frequency of matched 
husband and wife couples is greater than predicted in all cases, whereas 
most of the other observed frequencies are equal to or less than the 
prediction. 


As a rule, to use the y? test the predicted frequency for each cell should be at 
least 5. 


There are exceptions to this rule but they are beyond the scope of this book. 


Could you have performed the x? test in frame 18 if you had a sample of 100 
couples? 


No. Many of the predicted frequencies would be less than 5S. 
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28. 


29. 


You are investigating the sensitivity of goldfish to industrial pollutants 1n their 
water. Some fish appear to be very sensitive. They turn white and die instantly. 
Others are moderately sensitive. They show signs of distress, but do not die, 
and many show no reaction at all. Since you are working with both black and 
red goldfish, you would like to know if there is evidence of a difference in 
sensitivity for the two colors. The following tables classifies a group of gold- 
fish according to color and sensitivity. Use the marginals to calculate predicted 
frequencies on the assumption that there is no relation between color and 
sensitivity and fill them in in the table. 
Sensitivity 
Color High Medium Low Total we 
Red F 
f 4 4 22 30 60% 
Black F 
f 6 6 8 20 40% 
Total 10 10 30 50 
Sensitivity 
Color High Medium Low Total ve 
Red F 6 6 18 
f 4 4 22 30 60% 
Black F 4 4 12 
f 6 6 8 20 40% 
Total 10 10 30 50 


No. Two of the predicted frequencies are less than S. 


When the predicted frequencies are less than 5, it is sometimes possible to 
combine categories so that the predicted frequencies are large enough. In this 
case you could logically combine two of the sensitivity categories in order to 
obtain predicted frequencies larger than 5. Prepare a new table that does this. 
Your table will now have only 4 cells. 
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30. 


31. 


Sensitivity 
Color High or Moderate Low Total He 
Red F 12 18 
a 8 22 30 OU, 
Black F 8 12 
f 12 | 8 20 40°% 
Total 20 30 50 


What are the degrees of freedom for this new table? 


l@-b)e- 1h) 
When you use a computer to calculate chi square you must still decide on the 
predicted frequencies and often compute them by hand. 


Study the following printout. 
CHI SQUARE TEST 


CASES CASES 

CATEGORY OBSERVED EXPECTED RESIDUAL 
1 29 22.00 7.00 
2 19 20.00 — 1.00 
3 18 18.00 0.00 
4 25 18.00 7.00 
5 17 18.00 —1.00 
6 10 18.00 — 8.00 
7 15 16.00 —1.00 
8 1 14.00 ~3.00 

TOTAL 144 

CHI SQUARE DF SIGNIFICANCE 

9.316 7 0.231 


What label corresponds to f in our formula? 
What label corresponds to F? How often would 
you expect to obtain the results in the printout by chance? 


f = CASES OBSERVED 

F = CASES EXPECTED 

p = 0.231 You would expect to obtain a value of x’ this great by chance 
about 23% of the time. 
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WHEN TO USE CHI SQUARE 


The ¢ test for comparing two means and the analysis of variance both involve 
assumptions that the populations are normally distributed and have equal variances. 
A x? test can often be used when the data does not meet these assumptions. Instead 
of using the measurements to calculate a mean, you simply use them to categorize 
individual observations; for example, as “high,” “medium,” and “low” or above 
and below the mean. The choice of categories should be guided by your theory and 
by the possibility of constructing an appropriate set of theoretical frequencies. 


32. Here is a problem you considered earlier. The following is a summary of the 
adjustment scores of mental patients treated with three different drugs. 


Group | Group 2 Group 3 
. 5 a 10 xX = 4 
v= 4 57 = 23 SS. 5 
n = 10 n = 33 n= 45 


Analysis of variance is not appropriate because the variances of the three 
groups are unequal. What categories could you use to set up an y” test? 


esi ia i i i i ee ee el lee 


Group |, Group 2, Group 3, high adjustment score, and low adjustment score. 


33. When you can use a ¢ test or analysis of variance, it is better to do so because 
the power of these tests is greater than the power of a x? test; that is, you are 
more likely to be able to reject the null hypothesis if your theory is true. You 
are interested in whether educational background makes a difference in scores 
on a trouble-shooting test. You have three samples of 25 people each with the 
following backgrounds: college, high school, and technical institute. The 
trouble-shooting scores appear to be normally distributed and the sample 
variances are all quite similar. Could you use y* for examining this data? 


Yes, by using the trouble-shooting scores to categorize individuals, for example, 
as high, medium, and low. 
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34. Could you use analysis of variance for examining this data? 


35. 


Yes 


Analysis of variance 


You want a quick, informal check on approximately how unusual a distribution 
is. Either y? or analysis of variance could be used. Which one will you choose? 


We would choose y’ because it is usually easier to compute, and precise formal 
hypothesis testing is not required in this case. 


REVIEW PROBLEMS 


If you have successfully completed this chapter, you can now use the chi square 
distribution to test hypotheses about the distribution of a population. You can: 


develop an appropriate theoretical population distribution and perform a x” test 
with frequency data on a single variable; 

develop an appropriate theoretical population distribution and perform a x? test 
with frequency data on two variables; 

recognize situations in which x? can be used as a substitute for analysis of 
variance or ¢ tests. 


Now try these review problems. Table I on pp. 253-256 lists any formulas you 
may need for reference. 


l. 


A factory has four machines that produce molded parts. A sample of 500 parts 
is collected for each machine and the number of defective parts in each sample 
is determined: | 


Machine l 2 3 4 
Defects/500 10 25 0 5 


Is there a difference between the machines? Outline an appropriate statistical 
test and, if possible, determine if these data are significant. 
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2. The situation is the same as in problem | but you are measuring the mean 
diameter of the parts. You suspect that the mean diameters for parts produced 
on the four machines are not the same. What would be your first choice of a 
statistical test? What assumptions would be required? 


3. The dropout rate among volunteer workers in community development pro- 
grams varies widely. The theory is advanced that the degree of involvement of 
volunteers in setting the goals of the program influences the dropout rate. In 
a new program 27 volunteers are selected to participate in a series of goal- 
setting workshops as part of their activities, whereas 23 others are simply 
assigned tasks. At the end of two months the results are as follows: 


Remained in Dropped 


| Program Out Total 
Workshop group 18 9 27 
No workshop group 10 13 23 
Total 28 22 


Are these results significant? Outline an appropriate statistical test and compute 
the necessary Statistic. 
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4. As part of an attitude survey, a sample of men and women 1s asked to rate a 
number of statements on a scale of 1 to 5, according to whether they agree or 
disagree. The following are the results for one of the statements. 


Agree Disagree 
Strongly Strongly Total 
| pi 3 4 5 
Women 3 13 10 16 7 50 
Men 2 12 26 10 l 50 
Total 5 25 36 26 8 100 


Use x’ to determine whether there is a significant difference between the 
answers of men and women. 


Answers 


To review a problem, study the frames indicated after the answer. 


l. 


You can use y” for this problem. The theoretical distribution would call for an 
equal number of defects from each machine. 


Machine | 2 3 4 Total 
a 10 10 10 10 40 
f 10 25 0 5 40 
F is greater than S for all cells. 
f F (U-F) (eae) (f— FY/F 
10 10 0 0 0 
25 10 15 225 22 
0 10 —10 100 10.0 
5 10 —5 25 2.5 
x’? = 35.0 
df = 3 


For a = 0.05 the critical region is y’? 2 7.81. 
For a = 0.01 the critical region is x? 2 11.34. 
There is a significant difference among machines. See frames | to 11. 
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Your first choice should be analysis of variance, since you are dealing with 
measurements rather than count data. This test will give you the greatest power 
and the best chance of rejecting the null hypothesis if your theory is true. To use 
analysis of variance you must assume that the populations of measurements 
for all machines are approximately normally distributed and have approxi- 
mately equal variances. See frames 32 to 36 and Chapter 6, frames 69 to 74. 
x’ iS an appropriate test. The null hypothesis is that the workshop has no 
effect on the dropout rate and that equal numbers drop out from both groups. 
The alternative is that different numbers will drop out. The critical region for 
aw = 0.05 is x? = 3.84 (df = 1). 


fF ¢-F U-F Uf- PF 


1815.21 2.88 8.29 0.548 
10 12.88 2.88 8.29 0.644 
9 11.88  —2.88 8.29 0.698 
13 10.12 2.88 8.29 0.819 

x? = 2.709 


The results are not significant. 


Half of the sample 1s women and half men. If there is no difference in their 
responses, you would expect equal numbers of menand women ineach category 
of response. The predicted frequencies on this basis are 


l 2 3 4 5 
Women 2.5 25 18 13 4 
Men 2.5 12.5 18 13 4 


Because four of the predicted frequencies are less than 5, you must combine 
categories. By combining categories | and 2 and 4 and 5 you may compute x”. 


| and 2 3 4 and § 


Women FE 15 18 17 
ca 16 10 23 
Men F 15 18 17 
f 14 26 1] 
, es (f— F) (f — Fy’ (f— F)/F 
16 15 | l 0.067 
14 15 — | l 0.067 
10 18 —8 64 3.556 
26 18 8 64 3.556 
23 17 6 36 2.118 
11 Ne —6 36 2.118 
x? = 11.482 


For « = 0.01 the critical region is y* 2 9.210 (df = 2). The difference is signifi- 
cant. See frames 26 to 30. 


CHAPTER NINE © 
The Combined Effects of 
Two Variables 


Analysis of variance can be extended to study the combined effect of two 
experimental treatments on a measure. This method of analysis allows for the 
design of experiments that are particularly efficient. That is, you can test a large 
number of hypotheses from relatively few experimental observations. Despite what 
you may feel at the end of this book, experimental observations are usually more 
_work than statistical analysis. As a result, two-way analysis of variance is a very 
popular technique for experimenters. 

When you have completed this chapter, you will be able to perform and interpret 
a simple two-way analysis of variance. 


1. Often a researcher is interested in the combined effects of two 
variables on a third measure; for example, consider this problem. A researcher 
is trying to determine the conditions that will produce the greatest yield 
from melon vines. He suspects that both the amount of fertilizer used and 
the amount of water the plants receive will influence the number of melons 
on a vine. In this case, the two variables whose effect is under study are 

and 


amount of fertilizer 
amount of water 


2. The measure under study ts 


eae a eee ae ae 


number of melons per vine 
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3. To study the effect of the amount of water alone the researcher might cultivate 
20 melon vines with heavy watering and 20 with light watering and count the 
number of melons per vine in each group. What sort of statistical test could 
he use. What assumptions would be involved? 


He could use a ¢ test for the difference between two means, assuming that 
the number of melons per vine is approximately normally distributed and that 
the variances would be approximately equal under the two conditions. If 
he could not make these assumptions, he could usea y” test by counting the 
number of vines under each condition that produced above or below average 
numbers of melons. 


4. To study the effect of amount of fertilizer alone he might cultivate three 
groups of vines, one with low fertilizer, one with a moderate amount, and one 
with a high amount. What sort of statistical test could he use for this experi- 
ment? What assumptions are involved? 


He could use analysis of variance, assuming approximately normal distribu- 
tions and equal variances, or. he could use a y’” test. 


5. Suppose that there is an interaction between the effects of the water and 
fertilizer. Perhaps fertilizer is effective only if there is also heavy watering or 
perhaps fertilizer helps to compensate for the effects of light watering. Will a 
study of watering alone or of fertilizing alone detect this interaction? 


No 
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In this kind of situation it 1s possible, in effect, to perform both studies at 
once and study the results with a two-way analysis of variance. To do this 
the experimenter will set up six experimental groups as follows: 


Water Fertilizer 


Light Medium Heavy 


Heavy 
Light 


If he selects 10 plants at random for each group, how many plants will he 
have in all? 


60 


30 


20 


Let us establish some terminology. We call each group of 10a cell; for example, 
the plants with heavy watering and medium fertilizer make up a cell. We 
call each level of watering a row; for example, all plants with light watering 
make up a row. We call each level of fertilizing a column; for example, all 
plants with light fertilizing make up a column. 


All plants with medium fertilizing would be called a 


column 
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A 


10. All plants with medium fertilizing and light watering are called a 


11. All plants with heavy watering are called a 


rOW 
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ee eS 


12. Here are the results of the experiment: 


Fertilizer 
Water Light Medium Heavy Totals 
Num- x? Num- x? Num- x? x ie 
ber of ber of ber of 
Melons Melons Melons 
Heavy | 4 16 6 36 
2 4 5 25 7 49 
2 4 > 25 7 49 
3 9 6 36 8 64 
3 9 6 36 8 64 
3 9 6 36 8 64 
3 9 6 36 8 64 
4 16 7 49 9 8 | 
4 16 7 49 9 8 | 
5 25 8 64 10 100 
30 102 60 372 80 652 170 1126 
Light 5 25 3 9 0 0 
6 36 4 16 | | 
6 36 4 16 | | 
7 49 5 25 2 4 
7 49 5 25 2 4 
7 49 5 25 2 4 
7 49 5 25 2 4 
8 64 6 36 3 9 
8 64 6 36 3 9 
9 8 | 7. 49 4 16 
70 502 50 262 20 52 140 816 
Totals 100 604 110 634 100 704 
Grand total 310 1942 


As you can see, we have already computed 2x or x” for each cell, row, and 
column and for the total group. What is 2x for the heavy row” 


170 
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13. 


14. 


15. 


What is 2x? for the light column? 


604 


As you know, analysis of variance allows us to estimate the variance of the 
population on the basis of the differences between group means and to com- 
pare this variance estimate with an estimate based on the individual differences 
within groups. We can group the data for a variance estimate based on group 
means in several ways. We can look at six individual cell means, at two row 
means, or at three column means. How many different variance estimates can 
we obtain from these groupings? 


Three. The variance of the individual cell means can be used to estimate 
population variance; so can the variance of row means and the variance of 
column means. 
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16. Here are the formulas you will use for a two-way analysis of variance. Some 
of them you have already used for the one-way analysis of variance. Some of 
them are new. Which ones are new? Place a check mark next to the new ones. 


Total sum of squares 


(Xx,)’ 
pe = or as 


df = N-1 


Between groups sum of squares 


(Ex) | &x/ (Zx7)" 
ee ee ee eS 
n n N 
df = rc-| 
Between rows sum of squares 
02s Cerne es (2 Xtows) aah: ndan Sas (Zx7/ 
nc nc N 
df=r-— 1 
Between columns sum of squares 
(Xcot1)? a (ZX col») ae ars (2x7) 
nr nr N 
df = c— 1 


Interaction sum of squares 


between groups sum of squares—{between rows sum of squares + between 
columns sum of squares) or 


(2x, ) fe (2x,) ge. ga ae (2 Xow)” a (2 Xrow2)° eee aii ee (2X cot)” _ (2X cov) _ 
n n nc AC nr nr 
ts ee. ae (2x7) 
N 


df = (r — 1)(c - 1) 
Within groups sum of squares 


total sum of squares—between groups sum of squares or 


a ee ee ow Lo 


The new ones are: 


Between rows sum of squares 
Between columns sum of squares 
Interaction sum of squares 
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17. 


18. 


19. 


20. 


21. 


The between groups sum of squares is based on the cells. It tells you exactly 
the same thing it would in a one-way analysis of variance. If you divide the 
between groups variance by the within groups variance and obtain a significant 
value of F, what does it tell you? Think of the null hypothesis. 


It tells you that all the cell means are not equal. 


Will a significant between groups variance tell you anything about whether 
water or fertilizer, or both, is the cause of the differences? 


No. You know only that at least one cell mean is significantly different from 
the others. 


Look closely at the formula for between groups sum of squares. It is identical 
to the one you used for one-way analysis of variance. In the formula for 
degrees of freedom r stands for the number of rows and c, for the number of 
columns. In our problem what are the degrees of freedom for the between 
groups variance? 


(2x 3)-1=5 


Now look at the formulas for between rows sum of squares and between 
columns sum of squares. They are the same as the between groups formula, 
except that they are based on rows and columns instead of the individual cells. 
In our problem what value corresponds to 2 x,oy,? 


What value corresponds to S x¢9),? 
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22s 


ZS: 


2. 


In the formula for between rows sum of squares n stands for the number of 
measurements in a cell and c, for the number of columns. The number of 
Measurements in a row Is nc. In our problem this is 


30 


If we divide the between rows variance estimate by the within groups variance 
estimate and obtain a significant F, we can conclude that_____———s makesa 
difference in the mean number of melons on a plant. 


water 


If the between columns variance estimate is significant, we can conclude that 
makes a difference. 


cee i 


fertilizer 


The between groups sum of squares will usually be larger than the sum of the 
between rows and between columns sums of squares; that is, part of the variance 
between cells will not be explained by either the difference between rows or 
the difference between columns. This remaining part of the difference between 
cells must be due to the combined effect of the two variables: that 1s, inter- 
action. 


Suppose that fertilizer improves the yield of vines under most conditions but 
heavy watering combined with a medium application of fertilizer washes away 
the fertilizer so that it has no effect. In this case you would expect two signifi- 
cant variance estimates; __ CSCS and 


between columns 
interaction 
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26. The results of a two-way analysis of variance for the melon problem are sum- 


27. 


28. 


marized below: 


Sum of squares df Variance Estimate F 
Total 340.33 59 
Between rows 15.00 ] 15.00 11.28 
Between columns 3.33 2 1.67 1.26 
Interaction 250.00 2 125.00 93.99 
Within groups 72.00 54 1.33 


(The between groups sum of squares is 268.33, but this figure is not usually 
included in a summary because the rows, columns, and interaction sums of 
squares give all the information.) 


Are any of these F ratios significant? If so, which ones? 


Between rows and interaction are both significant. Note that the F table has no 
entry corresponding to df = 54. The values for df = 50 and df = 55 are close 
enough to ensure that the results are significant. 


To interpret the results of a two-way analysis of variance it 1s helpful to ex- 
amine the cell means and compare them with a mean for each row and column 
and a grand total mean. 


Water Fertilizer 

Light Medium Heavy All conditions 
Heavy 3.0 6.0 8.0 5.7 
Light 7.0 5.0 2.0 4.7 
All conditions 5.0 55 5.0 5.2 


Grand total 
What combination produces the greatest yields? 


Heavy watering and heavy fertilizer 


What combination produces the poorest yields? 


Light watering and heavy fertilizer 
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29. You are planning to raise melons in a situation in which you cannot control 
or predict the amount of watering. There is no irrigation and rainfall is un- 
predictably light or heavy. A low yield would be a disaster. What level cf 
fertilizer would you use on the basis of the data in the experiment? 


Medium. It ensures a relatively high yield no matter what the level of watering. 
Either heavy or light carries a risk of lower yields, although both also bring a 
chance of higher yields. 


30. Of course, the labor of computing an analysis of variance can be handled by a 
computer with great ease. A printout of the results of the melon problem 
would look like this: 


SOURCE OF SUM OF MEAN 
VARIATION SQUARES DF SQUARE F SIGNIFICANCE 
MAIN EFFECTS 
WATER 15.00 1 15.00 11.28 0.000 
FERTILIZER 3.33 2 1.67 1.26 0.691 
2-WAY INTERACTIONS 
WATER FERTILIZER 250.00 2 125.00 93.99 0.000 
RESIDUAL 72.00 54 1.33 
TOTAL 340.33 59 5.77 
CELL MEANS 
WATER 
FERTILIZER 1 2 
1 3.0 7.0 
(10) (10) 
2 6.0 5.0 
(10) (10) 
3 8.0 2.0 
(10) (10) 


In this particular printout the within groups result is called by a different 
name. What is the within groups result called here? 


RESIDUAL 
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31. 


32. 


33. 


The two-way analysis of variance involves the same assumptions as other 
forms of analysis of variance. What are they? 


The populations involved are approximately normally distributed and have 
approximately equal variances. 


You expect one of your experimental treatments to have a dramatic effect 


on the variability of the data. Should you plan a two-way analysis of variance 


for your statistical test? 


You are studying measures that in your general experience are bimodally 


distributed. Should you plan to use analysis of variance for a statistical test? 


No. In this case a yx” test would probably be appropriate, since the data seem 


to fall into two groups. 


34. A chemist 1s studying the production of an organic compound in a process 


that involves a catalyst. She believes that the amount of the compound 
produced in a given processing time will vary according to the catalyst used 
and according to the temperature but that the relative effectiveness of the 
different catalysts is not influenced by the temperature. Assume that five 
Catalysts and four temperature levels are to be investigated. The measure to be 
used will be units of the compound produced in a 5-minute processing run. 
The chemist will make a total of 200 processing runs. 


(a) Outline the design for an analysis of variance study. What will the rows 
and columns be and how many measurements will be made for each 
cell? 

(b) According to the chemist’s theory, which F ratios should be significant? 
Which should not be significant? 

Be sure to consider all F ratios. 


(a) Heat Catalyst 
A B C D E 


10 measurements per cell 


eH wWwhrhd — 


° 
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35. 


37. 


(b) Both the row and column variances should be significant, but there should 
be no significant interaction because this would indicate a situation in 
which the relative effectiveness of the catalysts is influenced by tempera- 
ture. 


A researcher believes that he has developed a test of language-learning ability 
that will predict a student’s ability to learn any language. He proposed to 
evaluate his test by using it to classify students as high, medium, or low in 
learning ability, giving them language training and then achievement tests. To 
make sure that the test works uniformly well for several languages he will train 
and test students in Spanish, Russian, Swahili, and Japanese. Thus he plans a 
two-way analysis of variance. In general, a student is trained and tested on 
only one language, but the researcher ran short of high-ability students. He 
then trained and tested 10 high-ability students in two languages each. Is this 
an acceptable procedure? 


No. Analysis of variance assumes independent random samples. 


Assuming that the researcher obtains an appropriate sample and sets up his 
study so that the different languages are the columns, which F ratios should 
be significant? 


The F ratio for rows (ability level) would be significant. The interaction should 
not be significant, since he expects the test to work equally well for all languages. 


What would a significant F ratio for columns (languages) mean” 


That the achievement tests are not of equal difficulty or that some languages 
are more difficult to learn than others. 
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REVIEW PROBLEMS 


If you have successfully completed this chapter, you can now perform and interpret 
a two-way analysis of variance. You can: 


compute F ratios for rows, columns, and interaction; 

examine cell means to interpret the results of two-way analysis of variance; 
recognize situations were a two-way analysis of variance is not appropriate and 
suggest alternative tests. 


Now try these review problems. Table I on pp. 253-256 lists any formulas you 


may need for reference. 


A study is conducted of the effects of motivation and method of instruction 
on achievement in classroom learning. One hundred high school students are 
assigned at random to four groups. Half the students are promised 50c¢ per 
point on the final exam. The others are paid $5 to participate in the experi- 
ment, regardless of how they score. Half the ‘thigh motivation” group (the 
groups paid 50¢ per point) and half the “low motivation” group (the group paid 
regardless of score) are taught by one instructor who employs an inductive 
discovery approach. The remaining students are taught by a second instructor, 
who employs an expository deductive approach. A standardized final exam is 
administered to all students. The results of the study are as follows 


Inductive Deductive Total 
High motivation x = 54 X=. 55 54.5 
(s =, 5.1) (= 53) 
Low motivation x = 56 x= 55 45.5 
(s= 4.9) (s= 6.1) 
Total 55 45 


An analysis of variance shows no significant effect of motivation or of method 
of instruction, but the interaction is significant at the 5°%% level. The researcher 
concludes on the basis of these results that the inductive method is superior for 
poorly motivated students, whereas the deductive method is superior for 
well-motivated students. Does the data support these conclusions? Why? 
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2. The following are dexterity scores for boys and girls at three age levels. A high 
score indicates many errors and low dexterity. Complete a two-way analysis of 
variance and interpret the results. Refer to the formulas on page 253 for 


guidance. 
Age 
5-7 7-9 9-11 
5 3 l 
6 4 2 
Girls 6 5 2 
6 6 2 
7 7 3 
6 3 ] 
7 4 2 
Boys 7 5 3 
8 6 4 
8 7 5 


3. You conclude from a two-way analysis of variance that age appears to be related 
to dexterity, but sex is not. What statistical techniques could you use to further 
describe and analyze this relationship? 
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4. Summarize briefly the assumptions required for 


(a) az test for the difference between the means of two samples 
(b) aX test | 

(c) aztest for the difference between the means of two samples 
(d) analysis of variance 

(e) atest of the null hypothesis p = 0 


Answers 


To review a problem, study the frames indicated after the answer. 


l. 


There are two major difficulties with the researcher’s interpretation of his results. 
(a) The cell means do not fit his description of the interaction. In fact, the inter- 
action appears to consist only of the fact that deductive students with low 
motivation do less well than all the other groups. 

(b) Because each method was used by a single instructor, it is impossible to 
separate the method of instruction from his other characteristics—the loudness 
of his voice and the radiance of his smile. You can reject the null hypothesis that 
there is no difference between the groups, but there are numerous plausible 
alternatives. See frames 25, 34 to 37, and Chapter 4, frames 26 to 31. 


5-7 7-9 9-11 Totals 
x x xox ae ed x x 
5 25 3. «9 1 1 
Girls 6 36 4 16 2 4 
6 36 5°25 2 4 
6 36 6 36 2 4 
7 49 7 49 3 9 
30 182 25 135 10 22 65 339 
6 36 3. 6«@s 1 1 
7 49 4 16 2 4 
Boys 7 49 5S 25 3 9 
8 64 6 36 4 16 
8 64 7 49 5 25 - 
36 262 25 135 15 55 16 452 
Totals 66 444 50 270 25 77 141 791 
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Total sum of squares: 


2 2 
px 2 — EX. 791 — UY = 791 — 662.7 = 128.3 


df= N— 1 = 29 
Between groups sum of squares: 


2 2 2 2 2 2 2 
(x, ) fn (2x,) = (2 x;) " (2x,) = (2x;) : (2 x,) = (2x7) S 
n, n, n, n, nN, Ny N 


— BOY (25  GOY G36 | (25) 
Ee GS ee 
+ 20.00 + 259.20 + 125 + 45 — 662.7 = 91.50 
df=g-—1=5 


2 
: ole — 662.7 = 180.00 + 125.00 + 


Between rows sum of squares 


2 2 2 2 2 
(ExnV | xr _ Exrl’ _ (5 | OY 665 7 — 291.67 + 385.07 — 662.7 = 
nc nc N 15 | 15 
- 4.04 


di=r=)=1 
Between columns sum of squares 


(Exe | (ExXea | Exes)? — Exrl _ (667 , OO, (5) 


nr nr nr N 10 * 10 i 10 
= 435.6 + 250.0 + 62.5 — 662.7 = 85.40 


— 662.7 = 


Interaction sum of squares: 
Between groups — (between rows + between columns) = 91.50 — (85.40 + 


+ 4.04) = 2.06 

Within groups sum of squares: 
Total — between groups = 128.30 — 91.50 = 36.80 

Sum of Squares df Variance Estimate F 
Total 128.30 29 
Rows (sex) 4.04 | 4.040 2.64 
Columns (age) 85.40 2 42.700 27.95 
Interaction 2.06 2 1.030 0.67 
Within groups 36.80 24 1.533 


The age groupings have a significant effect on scores but sex has not. See frames 
16 to 29. 

The techniques of correlation and regression would be appropriate—scatter- 
grams, correlation coefficients, and regression equations. See Chapter 7. 
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(b) 


(d) 


(e) 


Both samples are larger than 30. 

See Chapter 5, frames 28 to 34. 

All cells have predicted frequency greater than 5. 

See Chapter 8, frames 26 to 30. 

Both populations are approximately normally distributed and have ap- 
proximately equal variances. 

See Chapter 5, frames 28 to 34. 

All populations are approximately normally distributed and have ap- 
proximately equal variances. 

See Chapter 6, frames 69 to 74. 

Both scores are normally distributed; sample is drawn at random. 

See Chapter 7, frames 35 to 45. 


r= 


TABLE I REFERENCE FORMULAS 


Parameters and Statistics 


= EX ae nk 
ey a kT 
pee y= — mp)? ai [Sx? — (ZxP/n 
n n 
_ yy ee oe 2 
ve ae = a te “ /n 
xX — pb X~-U 
a a ola 
L y(e-%) V=D 9 nSby) - ENEY) 
ae Sy [n= x? — (Ex) ]}[nzy — (Sy) 


Central Limit Theorem for Large Samples 


Mx > 
ge 
© a 


Regression Analysis 
y=y + d(x -X) 


p — MECxY) = Ex) 
nox? — (=x) 


Confidence Intervals 


Hypothesis Testing 
Null Hypothesis « = C 


_x-C 
—shyn 
x—-C 
s/\/n- 


eS 
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TABLE I (continued) 


Null Hypothesis u, = nu, 


xX — De 
z a . . 
s,?/n, + S,?/n, 
ee ry ; 2 
ee eae » — (a = Ms? + (tm = sy" 


/s?/n, + S?/n, n+n,—2 


Null Hypothesis o, = o, 


ee 
5," 


Null Hypothesis sample is a random sample from a population with a given 
distribution. 


| _ FY 
ii f - ) 
df = c—1 or (c—1\(r—1) 


. *, 
Null Hypothesis u, = wu, = wu, = ete. 
ONE-WAY ANALYSIS OF VARIANCE 


_ between groups variance 
within groups variance 


Total sum of squares 


Bie: _ (x7) 
N 
df= N- 1 
Between groups sum of squares 
v 2 2 2 
Gx? Cay, gg. Oe) 
n, n, N 
df=g- |] 


Within groups sum of squares 
Total sum of squares — between groups sum of squares 


2 2 
ES — oa +| Bx) — a + 4.2, 2 CLC, 
2 
df= N-g 


Or 


caer 
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TABLE I (continued) 


TWO-WAY ANALYSIS OF VARIANCE 


_ between rows variance 

~ within groups variance 

_ between columns variance 

~_-within groups variance 
interaction variance 


eS 
within groups variance 


Total sum of squares 


» _ (2xpf 
N 


S 
howd Xy 


Between groups sum of squares 


v 2 % 2 2.2 
ex) + es) +...etc. — CNN 
n n N 
df =rc— | 


Between rows sum of squares 


(PE er ff (2 Xr6us)? a 2. Epes — (x,/ 
nc 4 AC 
df=r-— | 


Between columns sum of squares 


(2 Xsiy)- ae (2 Xcon) ae) (2x,;) 
nr nr N 
df=c-— 1 


Interaction sum of squares 
Between groups sum of squares — [between rows sum of squares + between 
columns sum of squares] 


or 
(2 x).)° (2x,) Pe a etc. —_ O25 are i _ (2X) ae etc. _ (2X col y’ 
n n nc nc nr 
DX exten) De en 
Cai act OD 
nr N 
df = (r— 1)(c - 1) 
Within groups sum of squares 
Total sum of squares — between groups sum of squares 
or 


» Dy a 
lex? - S8F]. [ane - C27] «ete 
Nn n 


df= N—-rc 
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HOW TO USE TABLE I] 


To find the square root of a number between | and 10, find your number in the 
column headed N. Look across to the column headed VN for the square root. For 
example, to find the square root of 5: 

locate 5.00 under N 

look across to find VN =2.23607 
To find the square root of a number from 10 to 100, divide the number by 10. Find 
the result in the column headed N. Then look across to the column headed V 10N 
for your square root. For example, to find the square root of 50: 

50+ 10=5 

locate 5.00 in the N column 

look across to find VION =7.07107 
To find the square root of a number over 100, move the decimal point an even 
number of places to the left until you have a number between | and 100. Look up 
the square root of that number in the table, as shown above. Then move the 
decimal point in the answer back to the right Aa/f as many places as you moved to 
the left. For example, to find the square root of 5000: 

move the decimal point two places to the left 50.00 

V50 =7.07107 

move the decimal point in the answer one place to the right 

70.7107 

V 5000 =70.7107 : 
To find the square root of a number less than one, move the decimal point an even 
number of places to the right until you have a number between | and 100. Look up 
the square root of that number in the table, as shown above. Then move the 
decimal point in the answer back to the left ha/f as many places as you moved to 
the right. For example, to find the square root of 0.0710: 

move the decimal point two places to the nght 07.10 

V7.10 =2.66458 

move the decimal point one place to the left .266458 

V0.0710 =0.266458 
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TABLE II SQUARES AND SQUARE ROOTS 


1.00 1.0000 1.00000 3.16228 1.50 2.2500 3.87298 


1.01 1.00499 3.17805 bol 1.22882 3.88587 
1.02 1.00995 3.19374 1.52 1.23288 3.89872 
1.03 1.01489 3.20936 1.53 1.23693 3.91152 
1.04 1.01980 3.22490 1.54 1.24097 3.92428 
1.05 1.02470 3.24037 1.55 1.24499 3.93700 
1.06 1.02956 3.25576 1.56 1.24900 3.94968 
LO7 1.03441 3.27109 1.57 1.25300 3.96232 
1.08 1.03923 3.28634 1.58 1.25698 3.97492 
1.09 1.04403 3.30151 1.59 1.26095 3.98748 
1.10 3.31662 1.60 4.00000 
bbs 1.05357 3.33167 1.61 1.26886 401248 
1.12 1.05830 3.34664 1.62 1.27279 4.02492 
1.13 1.06301 3.36155 1.63 1.27671 4.03733 
1.14 1.06771 3.37639 1.64 1.28062 4.04969 
1 TS 1.07238 3.39116 1.65 1.28452 4.06202 
1.16 1.07703 3.40588 1.66 1.28841 4.07431 
1.17 1.08167 3.42053 1.67 1.29228 4.08656 
1.18 1.08628 3.43511 1.68 1.29615 4.09878 
1.19 1.09087 3.44964 1.69 1 30000 4.11096 
1.20 1.09545 3.46410 1.70 4.12311 
1.21 1.10000 3.47851 1.71 1.30767 4.13521 
1.22 1.10454 3.49285 L.72 1.31149 4.14729 
123 1.10905 3.50714 1.73 1.31529 4.15933 
1.24 1.11355 3.52136 1.74 1.31909 4.17133 
1.25 1.11803 3.53553 1.75 1.32288 4.18330 
1.26 1.12250 3.54965 1.76 1.32665 4.19524 
1.27 1.12694 3.56371 1.77 1.33041 4.20714 
1.28 1.13137 3.57771 1.78 1.33417 4.21900 
1.29 1.13578 3.59166 1.79 1.33791 4.23084 
1.30 3.60555 1.80 4.24264 
1.31 1.14455 3.61939 1.81 '1.34536 4.2544] 
1.32 1.14891 3.63318 1.82 1.34907 4 26615 
1.33 1.15326 3.64692 1.83 1.35277 4.27785 
1.34 1.15758 3.66060 1.84 1 35647 4 28952 
1.35 1.16190 3.67423 1.85 1.36015 4.30116 
1.36 1.16619 3.68782 1.86 1.36382 4.31277 
137 1.17047 3.70135 1.87 1.36748 4.32435 
1.38 1.17473 3.71484 1.88 1.37113 4.33590 
1.39 1.17898 3.72827 1.89 1.37477 4.3474) 
1.40 3.74166 1.90 4.35890 
1.41 1.18743 3.75500 191 1.38203 4.37035 
1.42 1.19164 3.76829 1.92 1.38564 4.38178 
1.43 1.19583 3.78153 1.93 1.38924 4.39318 
1.44 1.20000 3.79473 1.94 1.39284 4.40454 
1.45 1.20416 3.80789 1.95 1.39642 4.41588 
1.46 1.20830 3.82099 1.96 1.40000 4.42719 
1.47 1.21244 3.83406 1.97 1.40357 4.43847 
1.48 1.21655 3.84708 1.98 1.40712 4.44972 
1.49 1.22066 3.86005 1.99 1.41067 4.46094 
1.50 2.2500 1.22474 3.87298 2.00 4.0000 1.41421 4.47214 


257 


258 


2.02 
2.03 


2.04 
2.05 
2.06 


2.07 
2.08 
2.09 


2.10 


2.11 
2.12 
2.13 


2.14 
2.15 
2.16 


2.17 
2.18 
2.19 


2.20 


2.21 
2.22 
2.23 


2.24 
2.25 
2.26 


2.27 
2.28 
2.29 


2.30 


2:31 
2.32 
2.33 


2.34 
2.35 
2.36 


2.37 
2.38 
2.39 


2.40 


2.41 
2.42 
2.43 


2.44 
2.45 
2.46 


2.47 
2.48 
2.49 


2.50 6.2500 1.58114 
= 


oj we | oe 
2.00 4.0000 1.41421 


2.01 - 


1.41774 
1.42127 
1.42478 


1.42829 
1.43178 


1.43527 


1.43875 
1.44222 


1.45258 
1.45602 
1.45945 


1.46287 
1.46629 
1.46969 


1.47309 
1.47648 


1.47986 


1.48661 
1.48997 
1.49332 


1.49666 
1.50000 
1.50333 


1.50665 
1.50997 
1.51327 


1.51987 
1.52315 
1.52643 


1.52971 
1.53297 
1.53623 


1.53948 
1.54272 


1.55252 
1.55563 
1.55885 


1.56205 


1.56525 
1.56844 


1.57162 
1.57480 
1.57797 


TABLE II (continued) 


ee 
2.50 6.2500 1.58114 


10N 


4.47214 


4.48330 
4.49444 
4.50555 


4.51664 
4.52769 
4.53872 


4.54973 
4.56070 
4.57165 


4.58258 


4.59347 
4.60435 
4.61519 


4.62601 
4.63681 
4.64758 


4.65833 
4.66905 
4.67974 


4.69042 


4.70106 
4.71169 
4.72229 


4.73286 
4.74342 
4.75395 


4.76445 
4.77493 
4.78539 


4.79583 


4.80625 
4.81664 
4.82701 


4.83735 
4.84768 
4.85798 


4.86826 
4.87852 
4.88876 


4.89898 


4.90918 
4.91935 
4.92950 


4.93964 
4.94975 
4.95984 


4.96991 
4.97996 
4.98999 


5.00000 


2.51 
2.52 
2.53 


2.54 
2.55 
2.56 


2.57 
2.58 
2.59 


2.60 


2.61 
2.62 
2.63 


2.64 
2.65 
2.66 


2.67 
2.68 
2.69 


2.70 


2.71 
2.72 
2.73 


2.74 
2.75 
2.76 


2.77 
2.78 
2.79 


2.80 


2.81 
2.82 
2.83 


2.84 
2.85 
2.86 


2.87 
2.88 
2.89 


2.90 


2.91 
2.92 
2.93 


2.94 
2.95 
2.96 


2.97 
2.98 
2.99 


3.00 9.0000 1.73205 
ri 


1.58430 
1.58745 
1.59060 


1.59374 
1.59687 
1.60000 


1.60312 
1.60624 
1.60935 


1.61555 
1.61864 
1.62173 


1.62481 
1.62788 
1.63095 


1.63401 
1.63707 
1.64012 


1.64621 
1.64924 
1.65227 


1.65529 
1.65831 
1.66132 


1.66433 
1.66733 
1.67033 


1.67631 
1.67929 
1.68226 


1.68523 
1.68819 
1.69115 


1.69411 
1.69706 
1.70000 


1.70587 
1.70880 
1.71172 


1.71464 


1.71756 
1.72047 


1.72337 
1.72627 
1.72916 


5.00000 


5.00999 
5.01996 
5.02991 


5.03984 
5.04975 
5.05964 


5.06952 
5.07937 
5.08920 


5.09902 


5.10882 
5.11859 
5.12835 


9.13809 
5.14782 
9.15752 


5.16720 
5.17687 
5.18652 


5.19615 


5.20577 
5.21536 
5.22494 


5.23450 
5.24404 
5.25357 


5.26308 
5.27257 
5.28205 


5.29150 


5.30094 
5.31037 
5.31977 


5.32917 
5.33854 
5.34790 


5.35724 
5.36656 
5.37587 


5.38516 


5.39444 
5.40370 
5.41295 


5.42218 
5.43139 
5.44059 


5.44977 
5.45894 
5.46809 


5.47723 


3.00 


3.01 
3.02 
3.03 


3.04 
3.05 
3.06 


3.07 
3.08 
3.09 


3.10 


3.11 
3.12 
3.13 


3.14 
3.15 
3.16 


3.17 
3.18 
3.19 


3.20 


3.21 
3.22 
3.23 


3.24 
3.25 
3.26 


3.27 
3.28 
3.29 


3.30 


3.31 
3.32 
3.33 


3.34 
3.35 
3.36 


3.37 
3.38 
3.39 


3.40 


3.4] 
3.42 
3.43 


3.44 
3.45 
3.46 


3.47 
3.48 
3.49 


3.50 


9.0000 1.73205 


1.73494 
1.73781. 
1.74069 


1.74356 
1.74642 


1.74929 


1.75214 
1.75499 
1.75784 


1.76352 
1.76635 
1.76918 


1.77200 
1.77482 
1.77764 


1.78045 
10.1124 1.78326 
10.1761 1.78606 


10.2400 1.78885 


10.3041 1.79165 
10.3684 1.79444 
10.4329 1.79722 


10.4976 1.80000 
10.5625 1.80278 
10.6276 1.80555 


10.6929 1.80831 
10.7584 1.81108 
10.8241 1.81384 


10.8900 1.81659 


10.9561 1.81934 
10.0224 1.82209 
11.0889 1.82483 


11.1556 1.82757 
11.2225 1.83030 
11.2896 1.83303 


11.3569 1.83576 
11.4244 1.83848 
11.4921 1.84120 


11.5600 1.84391 


11.6281 1.84662 
11.6964 1.84932 
11.7649 1.85203 


11.8336 1.85472 
1.85742 
1.86011 


1.86279 
12.1104 1.86548 
12.1801 1.86815 


12.2500 1.87083 


10.0489 


11.9025 
11.9716 


12.0409 


TABLE II 


V/ 10N 


5.47723 


5.48635 
5.49545 
9.50454 


5.51362 
5.52268 
5.53173 


5.54076 
5.54977 
5.55878 


9.56776 


5.57674 
9.58570 
5.59464 


5.60357 
5.61249 
5.62139 


9.63028 
5.63915 
5.64801 


5.65685 


5.66569 
5.67450 
5.68331 


5.69210 
5.70088 
5.70964 


5.71839 
9.72713 
9.73585 


5.74456 


9.75326 
5.76194 
5.77062 


5.77927 
5.78792 
9.79655 


5.80517 
5.81378 
9.82237 


5.83095 


9.83952 
5.84808 
5.85662 


5.86515 
5.87367 
5.88218 


5.89067 
5.89915 
5.90762 


5.91608 


V10N 


N 
3.50 12.2500 1.87083 


3.51 
3.52 
3.53 


3.54 
3.55 
3.56 


3.57 
3.58 
3.59 


3.60 


3.61 
3.62 
3.63 


3.64 
3.65 
3.66 


3.67 
3.68 
3.69 


3.70 


3.71 
3.72 
3.73 


3.74 
3.75 
3.76 


3.77 
3.78 
3.79 


3.80 


3.81 
3.82 
3.83 


3.84 
3.85 
3.86 


3.87 
3.88 
3.89 


3.90 


3.91 
3.92 
3.93 


3.94 
3.95 
3.96 


3.97 
3.98 
3.99 


4.00 


(continued) 
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1.87350 
1.87617 
1.87883 


1.88149 
1.88414 
1.88680 


1.88944 


12.3201 
12.3904 
12.4609 


12.5316 


12.6025 
12.6736 


12.7449 
12.8164 1.89209 
12.8881 1.89473 


13.0321 1.90000 
13.1044 1.90263 
13.1769 1.90526 


13.2496 1.90788 
13.3225 1.91050 
13.3956 1.91311 


13.4689 1.91572 
13.5424 1.91833 
13.6161 1.92094 


13.6900 1.92354 


13.7641 1.92614 
13.8384 1.92873 
13.9129 1.93132 


13.9876 1.93391 
14.0625 1.93649 
14.1376 1.93907 


14.2129 1.94165 
14.2884 1.94422 
14.3641 1.94679 


14.4400 1.94936 


14.5161 1.95192 
14.5924 1.95448 
14.6689 1.95704 


14.7456 1.95959 
14.8225 1.96214 
14.8996 1.96469 


14.9769 1.96723 
15.0544 1.96977 
15.1321 1.97231 


15.2100 1.97484 


15.2881 1.97737 
15,3664 1.97990 
15.4449 1.98242 


15.5236 1.98494 
1.98746 
1.98997 


1.99249 
15.8404 1.99499 
15.920] 1.99750 


16.0000 2.00000 


15.6025 
15.6816 


15.7609 


5.91608 


5.92453 
5.93296 
5.94138 


9.94979 
5.95819 
5.96657 


5.97495 
5.98331 
5.99166 


6.00000 


6.00833 
6.01664 
6.02495 


6.03324 
6.04152 
6.04949 


6.05805 
6.06630 
6.07454 


6.08276 


6.09098 
6.09918 
6.10737 


6.11555 
6.12372 
6.13188 


6.14003 
6.14817 
6.15630 


6.16441 


6.17252 
6.18061 
6.18870 


6.19677 
6.20484 
6.21289 


6.22093 
6.22896 
6.23699 


6.24500 


6.25300 
6.26099 
6.26897 


6.27694 
6.28490 
6.29285 


6.30079 
6.30872 
6.31644 


6.32456 


V10N 


259 


260 


N 


N 


16.0000 2.00000 


16.0801 2.00250 
16.1604 2.00499 
16.2409 2.00749 


16.3216 
16.4025 
16.4836 


16.5649 


2.00998 
2.01246 
2.01494 


2.01742 
16.6464 2.01990 
16.7281 2.02237 


16.8921 2.02731 
16.9744 2.02978 
17.0569 2.03224 


- 2.03470 
2.03715 
2.03961 


2.04206 
17.4724 2.04450 
17.5561 2.04695 


17.6400 | 2.04939 


17.7241 2.05183 
17.8084 2.05426 
17.8929 2.05670 


17.9776 2.05913 
18.0625 2.06155 
18.1476 2.06398 


18.2329 2.06640 
18.3184 2.06882 
18.4041 2.07123 


18.4900 2.07364 


18.5761 2.07605 
18.6624 2.07846 
18.7489 2.08087 


18.8356 2.08327 
18.9225 2.08567 
19.0096 2.08806 


19.0969 2.09045 
19.1844 2.09284 
19.272) 2.09523 


19.3600 2.09762 


19.4481 2.10000 
19.5364 2.10238 
19.6249 2.10476 


19.7136 2.10713 
2.10950 
2.11187 


2.11424 
20.0704 2.11660 
20.1601 2.11896 


20.2500 | 2.12132 
pw | ow 


17.1396 


17.2225 
17.3056 


17.3889 


19.8025 
19.8916 


19.9809 


TABLE II 


V10N 


6.32456 


6.33246 
6.34035 
6.34823 


6.35610 
6.36396 
6.37181 


6.37966 
6.38749 
6.39531 


6.40312 


6.41093 
6.41872 
6.42651 


6.43428 
6.44205 
6.44981 


6.45755 
6.46529 
6.47302 


6.48074 


6.48845 
6.49615 
6.50384 


6.51153 
6.51920 
6.52687 


6.53452 
6.54217 
6.54981 


6.55744 


6.66506 
6.57267 
6.58027 


6.58787 
6.59545 
6.60303 


6.61060 
6.61816 
6.62571 


6.63325 


6.64078 
6.64831 
6.65582 


6.66333 
6.67083 
6.67832 


6.68581 
6.69328 
6.70075 


6.70820 


(continued } 


20.2500 2.12132 


20.3401 2.12368 
20.4304 2.12603 
20.5209 2.12838 


20.6116 
20.7025 
20.7936 


20.8849 
20.9764 
21.0681 


2.13073 
2.13307 
2.13542 


2.13776 
2.14009 
2.14243 


21.1600 2.14476 
21.2521 2.14709 
21.3444 2.14942 
21.4369 2.15174 
21.5296 2.15407 
21.6225 2.15639 
21.7156 2.15870 
21.8089 2.16102 
21.9024 2.16333 
21.9961 2.16564 
22.0900 2.16795 
22.1841 2.17025 
22.2784 2.17256 
22.3729 2.17486 
22.4676 2.17715 
22.5625 2.17945 
22.6576 2.18174 
22.7529 2.18403 
22.8484 2.18632 
22.9441 2.18861 
23.0400 2.19089 


23.1361 
23.2324 
23.3289 


23.4256 
23.5225 
23.6196 


23.7169 
23.8144 2.20907 
23.9121 2.21133 


24.0100 2.21359 


24.1081 2.21585 
24.2064 2.21811 
24.3049 2.22036 


24.4036 2.22261 
2.22486 
2.2271) 


2.22935 


2.19317 
2.19545 
2.19773 


2.20000 
2.20227 
2.20454 


2.20681 


24.5025 
24.6016 


24.7009 
24.8004 2.23159 
24.9001 2.23383 


25.0000 2.23607 


V10N 


6.70820 


6.71565 
6.72309 
6.73053 


6.73795 
6.74537 
6.75278 


6.76018 
6.76757 
6.77495 


6.78233 


6.78970 
6.79706 
6.80441 


6.81175 
6.81909 
6.82642 


6.83374 
6.84105 
6.84836 


6.85565 


6.86294 
6.87023 
6.87750 


6.88477 
6.89202 
6.89928 


6.90652 
6.91375 
6.92098 


6.92820 


6.93542 
6.94262 
6.94982 


6.95701 
6.96419 
6.97137 


6.97854 
6.98570 
6.99285 


7.00000 


7.00714 
7.01427 
7.02140 


7.02851 
7.03562 
7.04273 


7.04982 
7.05691 
7.06399 


7.07107 


N 


5.00 


5.50 


_~ | we 
25.0000 2.23607 


25.1001 2.23830 
25.2004 2.24054 
25.3009 2.24277 
25.4016 2.24499 
25.5025 2.24722 
25.6036 2.24944 
25.7049 2.25167 
25.8064 2.25389 
25.9081 2.25610 
26.0100 2.25832 
26.1121 2.26053 
26.2144 2.26274 
26.3169 2.26495 
26.4196 2.26716 
26.5225 2.26936 
26.6256 2.27156 
26.7289 2.27376 
26.8324 2.27596 
26.9361 2.27816 
27.0400 2.28035 
27.1441 2.28254 
27.2484 2.28473 
27.3529 2.28692 
27.4576 2.28910 
27.5625 2.29129 
27.6676 2.29347 
27.7729 2.29565 
27.8784 2.29783 
27.9841 2.30000 
28.0900 2.30217 
28.1961 2.30434 
28.3024 2.30651 
28.4089 2.30868 
28.5156 2.31084 
28.6225 2.31301 
28.7296 2.31517 
28.8369 2.31733 
28.9444 2.31948 
29.0521 2.32164 
29.1600 2.32379 


29.2681 
29.3764 
29.4849 


29.5936 


2.32594 
2.32809 
2.33024 


2.33238 
2.33452 
2.33666 


2.33880 
30.0304 2.34094 
30.1401 2.34307 


30.2500 2.34521 


29.7025 
29.8116 


29.9209 


TABLE II 


ION 


7.07107 


7.07814 
7.08520 
7.09225 


7.09930 
7.10634 
7.11337 


7.12039 
7.12741 
7.13442 


7.14143 


7.14843 
7.15542 
7.16240 


7.16938 
7.17635 
7.18331 


7.19027 
7.19722 
7.20417 


7.21110 


7.21803 
7.22496 
7.23187 


7.23838 
7.24569 
7.25259 


7.25948 
7.26636 
7.27324 


7.28011 


7.28697 
7.29383 
7.30068 


7.30753 
7.31437 
7.32120 


7.32803 
7.33485 
7.34166 


7.34847 


7.35527 
7.36206 
7.36885 


7.37564 
7.38241 
7.38918 


7.39594 
7.40270 
7.40945 


7.41620 


= 


N 


5.50 


5.51 
5.52 
5.53 


5.54 
5.55 
5.56 


5.57 
5.58 
5.59 


5.60 


5.61 
5.62 
5.63 


5.64 
5.65 
5.66 


5.67 
5.68 
5.69 


5.70 


a7) 
5.72 
5.73 
5.74 


5.75 
5.76 


5.77 
5.78 
5.79 


5.80 


5.81 
9.82 
5.83 


5.84 
5.85 
5.86 


5.87 
5.88 
5.89 


5.90 


5.91 
5.92 
5.93 


5.94 
5.95 
5.96 


5.97 
5.98 
5.99 


6.00 
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7 
30.2500 2.34521 


30.3601 
30.4704 
30.5809 


30.6916 
30.8025 
30.9136 


31.0249 
31.1364 2.36220 
31.2481 2.36432 


31.4721 2.36854 
31.5844 2.37065 
31.6969 2.37276 


2.37487 
2.37697 
2.37908 


2.38118 


2.34734 
2.34947 
2.35160 


2.35372 
2.35584 
2.35797 


2.36008 


31.8096 


31.9225 
32.0356 


32.1489 
32.2624 2.38328 
32.3761 2.38537 


32.6041 2.38956 
32.7184 2.39165 
32.8329 2.39374 


32.9476 2.39583 
33.0625 2.39792 
33.1776 2.40000 


33.2929 2.40208 
33.4084 2.40416 
33.5241 2.40624 


| 33.6400 2.40832 


2.41039 
2.41247 
2.41454 


2.41661 
2.41868 
2.42074 


2.42281 


33.7561 
33.8724 
33.9889 


34.1056 
34.2225 
34.3396 


34.4569 
34.5744 | 2.42487 
34.6921 | 2.42693 


34.8100 2.42899 


34.9281 2.43105 
35.0464 2.43311 
35.1649 2.43516 


35.2836 2.43721 
2.43926 
2.44131 


2.44336 
35.7604 2.44540 
35.8801 2.44745 


36.0000 2.44949 


35.4025 
35.5216 


35.6409 


10N 


7.41620 


7.42294 
7.42967 
7.43640 


7.44312 
7.44983 
7.45654 


7 46324 
7.46994 
7.47663 


7.48331 


7.48999 
7.49667 
7.50333 


7.50999 
7.51665 
7.52330 


7.52994 
7.53658 
7.54321 


7.54983 


7.55645 
7.56307 
7.56968 


7.57628 
7.58288 
7.58947 


7.59605 
7.60263 
7.60920 


7.61577 


7.62234 
7.62889 
7.63544 


7.64199 
7.64853 
7.65506 


7.66159 
7.66812 
7.67463 


7.68115 


7.68765 
7.69415 
7.70065 


7.70714 
7.71362 
7.72010 


7.72658 
7.73305 
7.73951 


7.74597 


, = 


261 


262 


Le [oe 


6.00 
6.01 


6.02 


6.03 


6.04 
6.05 
6.06 
6.07 


6.08 
6.09 


6.10 


6.11 
6.12 
6.13 


6.14 
6.15 
6.16 
6.17 
6.18 
6.19 


6.20 


6.21 
6.22 
6.23 
6.24 
6.25 
6.26 


6.27 
6.28 
6.29 


6.30 


6.31 
6.32 
6.33 


6.34 
6.35 
6.36 


6.37 
6.38 
6.39 


6.40 


6.41 
6.42 
6.43 


6.44 
6.45 
6.46 


6.47 
6.48 
6.49 


6.50 


N 


36.0000 2.449 


36.1201 
36.2404 
36.3609 


36.4816 
36.6025 


36.7236 


36.8449 
36.9664 
37.0881 


37.3321 
37.4544 
37.5769 


37.6996 
37.8225 
37.9456 


38.0689 
38.1924 
38.3161 


38.5641 
38.6884 
38.8129 


38.9376 
39.0625 
39.1876 


39.3129 
39.4384 
39.5641 


39.8161 
39.9424 
40.0689 


40.1956 
40.3225 
40.4496 


40.5769 
40.7044 
40.8321 


41.0881 
41.2164 
41.3449 


41.4736 
41.6025 
41.7316 


41.8609 
41.9904 
42.1201 


N2 


2.45153 
2.45357 
2.4556] 


2.45764 
2.45967 
2.46171 


2.46374 
2.46577 
2.46779 


2.47184 
2.47386 
2.47588 


2.47790 
2.47992 
2.48193 


2.48395 
2.48596 
2.48797 


2.49199 
2.49399 
2.49600 


2.49800 
2.50000 
2.50200 


2.50400 
2.50599 
2.50799 


2.51197 
2.51396 
2.51595 


2.51794 
2.51992 
2.52190 


2.52389 
2.52587 
2.52784 


40.9600 2.52982 


2.53180 
2.53377 
2.53574 


2.53772 


2.53969 
2.54165 


2.54362 
2.54558 
2.54755 


42.2500 2.5495) 
= 


TABLE II (continued) 


10N 


7.74597 


7.75242 
7.75887 
7.7653] 


7.77174 
7.77817 
7.78460 


7.79102 
7.79744 
7.80385 


7.81025 


7.81665 
7.82304 
7.82943 


7.83582 
7.84219 
7.84857 


7.85493 
7.86130 
7.86766 


7.87401 


7.88036 
7.88670 
7.89303 


7.89937 
7.90569 
7.91202 


7.91833 
7.92465 
7.93095 


7.93725 


7.94355 
7.94984 
7.95613 


7.96241 
7.96869 
7.97496 


7.98123 
7.98749 
7.99375 


8.00000 


8.00625 
8.01249 
8.01873 


8.02496 
8.03119 
8.03741 


8.04363 
8.04984 
8.05605 


8.06226 


AAA Apes 
WOON An 


DA DDH AADHDIAHAIAAH AAA ADA AL ADA AAA ADAADHAIA [ADA AA 
Ol WNWN WNN NNNIN 
WON ANS WHK1TOLTWOON DNL WHY] O 


OD OWHMO WWW ig C00 MMM MH 


ON AOP WH 


42.2500 2.54951 


42.3801 2.55147 
42.5104 2.55343 
42.6409 2.55539 


2.55734 
2.55930 
2.56125 


2.56320 


42.7716 


42.9025 
43.0336 


43.1649 
43.2964 2.56515 
43.4281 2.56710 


43.6921 2.57099 
43.8244 2.57294 
43.9569 2.57488 


44.0896 2.57682 
44.2225 2.57876 
44.3556 2.58070 


44.4889 2.58263 
44.6224 2.58457 . 
44.7561 2.58650 


45.0241 2.59037 
45.1584 2.59230 
45.2929 2.59422 


45.4276 2.59615 
45.5625 2.59808 
45.6976 2.60000 


45.8329 2.60192 
45.9684 2.60384 
46.1041 2.60576 


46.2400 2.60768 


46.3761 2.60960 
46.5124 2.61151 
46.6489 2.61343 


46.7856 2.61534 
46.9225 2.61725 
47.0596 2.61916 


47.1969 2.62107 
47.3344 2.62298 
47.4721 2.62488 


47.6100 2.62679 


47.7481 2.62869 
47.8864 2.63059 
48.0249 2.63249 


48.1636 2.63439 
48.3025 2.63629 
2.63818 


2.64008 


48.4416 


48.5809 
48.7204 2.64197 
48.8601 2.64386 


49.0000 2.64575 


8.06226 


8.06846 
8.07465 
8.08084 


8.08703 
8.09321 
8.09938 


8.10555 
8.11172 
8.11788 


8.12404 


8.13019 
8.13634 
8.14248 


8.14862 
8.15475 
8.16088 


8.16701 
8.17313 
8.17924 


8.18535 


8.19146 
8.19756 
8.20366 


8.20975 
8.21584 
8.22192 


8.22800 
8.23408 
8.24015 


8.24621 


8.25227 
8.25833 
8.26438 


8.27043 
8.27647 
8.28251 


8.28855 
8.29458 
8.30060 


8.30662 


8.31264 
8.31865 
8.32466 


8.33067 
8.33667 
8.34266 


8.34865 
8.35464 
8.36062 


8.36660 


ce 


49.0000 2.64575 


7.00 


55.6516 


49.1401 
49.2804 
49.4209 


2.64764 
2.64953 
2.65141 


2.65330 
2.65518 
2.65707 


2.65895 
2.66083 
2.66271 


49.5616 


49.7025 
49 8436 


49.9849 
50.1264 
50.2681 


50.5521 
50.6944 
50.8369 


50.9796 
51.1225 
51.2656 


51.4089 
51.5524 
51.6961 


2.66646 
2.66833 
2.67021 


2.67208 
2.67395 
2.67582 


2.67769 
2.67955 
2.68142 


51.8400 2.68328 


51.9841 
52.1284 
52.2729 


52.4176 
92.5625 
52.7076 


52.8529 
52.9984 
53.1441 


2.68514 
2.68701 
2.68887 


2.69072 
2.69258 
2.69444 


2.69629 
2.69815 
2.70000 


53.2900 2.70185 


53.4361 
53.5824 
53.7289 


53.8756 
54.0225 
54.1696 


54.3169 
54.4644 
54.6121 


2.70370 
2.70555 
2.70740 


-2.70924 
2.71109 
2.71293 


2.71477 
2.71662 
2.71846 


54.7600 2.72029 


54.9081 
55.0564 
55.2049 


55.3536 
55.5025 


2.72213 
2.72397 
2.72580 


2.72764 
2.72947 
2.73130 


2.73313 
2.73496 
2.73679 


2. | 2.73861 | 


55.8009 
55.9504 
56.1001 


} 86.2500 _ 2500 


TABLE II 


10N 


8.36660 


8.37257 
8.37854 
8.38451 


8.39047 
8.39643 
8.40238 


8.40833 
8.41427 
8.42021 


8.42615 


8.43208 
8.43801 
8.44393 


8.44985 
8.45577 
8.46168 


8.46759 
8.47349 
8.47939 


8.48528 


8.49117 
8.49706 
8.50294 


8.50882 
8.51469 
8.52056 


8.52643 
8.53229 
8.53815 


8.54400 


8.54985 
8.55570 
8.56154 


8.56738 
8.57321 
8.57904 


8.58487 
8.59069 
8.59651 


8.60233 


8.60814 
8.61394 
8.61974 


8.62554 
8.63134 
8.63713 


8.64292 
8.64870 
8.65448 


8.66025 


VION 


7.50 


7.51 
pay 
7.53 


7.54 
7.55 
7.56 


fies 
7.58 
7.59 


7.60 


7.61 
7.62 
7.63 


7.64 
7.65 
7.66 


7.67 


7.68 


7.69 
7.70 


7.71 


7.72 
7.73 


7.74 
7.75 
7.76 


_ 7.97 


7.78 
7.79 


7.80 


7.81 
7.82 
7.83 


7.84 
7.85 
7.86 


7.87 
7.88 
7.89 


7.90 


7.91 
7.92 
7.93 


7.94 
7.95 
7.96 


7.97 
7.98 
7.99 


8.00 


(continued) 


De Da 


56.2500 2.73861 


596.4001 2.74044 
56.5504 2.74226 
56.7009 2.74408 


2.74591 
2.74773 
2.74955 


2.75136 


56.8516 


57.0025 
57.1536 


57.3049 
57.4564 2.75318 
57.6081 2.75500 


57.9121 2.75862 
58.0644 2.76043 
58.2169 2.76225 


58.3696 2.76405 
598.5225 2.76586 
58.6756 2.76767 


58.8289 2.76948 
98.9824 2.77128 
59.1361 2.77308 


59.2900 2.77489 


59.4441 2.77669 
59.5984 2.77849 
59.7529 2.78029 


599.9076 2.78209 
60.0625 2.78388 
60.2176 2.78568 


60.3729 2.78747 
60.5284 2.78927 
60.6841 2.79106 


60.8400 2.79285 


60.9961 2.79464 
61.1524 2.79643 
61.3089 2.79821 


61.4656 2.80000 
61.6225 2.80179 
61.7796 2.80357 


61.9369 2.80535 
62.0944 2.80713 
62.2521 2.80891 


62.4100 2.81069 


62.5681 2.81247 
62.7264 2.81425 
62.8849 2.81603 


63.0436 2.81780 
63.2025 2.81957 
2.82135 


2.82312 


63.3616 


63.5209 
63.6804 2.82489 
63.8401 2.82666 


64.0000 2.82843 


10N 


8.66025 


8.66603 
8.67179 
8.67756 


8.68332 
8.68907 
8.69483 


8.70057 
8.70632 
8.71206 


8.71780 
8.72353 


8.72926 
8.73499 


8.74071 


8.74643 
8.75214 


8.75785 
8.76356 
8.76926 


8.77496 


8.78066 
8.78635 
8.79204 


8.79773 
8.80341 
8.80909 


8.81476 
8.82043 
8.82610 


8.83176 


8.83742 
8.84308 
8.84873 


8.85438 
8.86002 
8.86566 


8.87130 
8.87694 
8.88257 


8.88819 


8.89382 
8.89944 
8.90505 


8.91067 
8.91628 
8.92188 


8.92749 
8.93308 
8.93868 


8.94427 


VION 


263 


264 


ee 
64 0000 2.82843 


8.50 


64.1601 
64.3204 
64.4809 


2.83019 
2.83196 
2.83373 


2.83549 
2.83725 
2.83901 


2.84077 
2.84253 
2.84429 


64.6416 


64.8025 
64.9636 


65.1249 
65.2864 
65.4481 


65.7721 
65.9344 
66.0969 


66.2596 
66.4225 
66.5856 


66.7489 
66.9124 
67.0761 


2.84781 
2.84956 
2.85132 


2.85307 
2.85482 
2.85657 


2.85832 
2.86007 
2.86182 


67.2400 2.86356 


67.4041 
67.5684 
67.7329 


67.8976 
68.0625 
68.2276 


68.3929 
68.5584 
68.7241 


2.86531 
2.86705 
2.86880 


2.87054 
2.87228 
2.87402 


2.87576 
2.87750 
2.87924 


68.8900 2.88097 


69.0561 
69.2224 
69.3889 


69.5556 
69.7225 
69.8896 


70.0569 
70.2244 
70.3921 


2.88271 
2.88444 
2.88617 


2.88791 
2.88964 
2.89137 


2.89310 
2.89482 
2.89655 


70.5600 2.89828 


2.90000 
2.90172 
2.90345 


2.90517 
2.90689 
2.90861 


2.91033 
2.91204 
2.91376 


70.7281 
70.8964 
71.0649 


71.2336 


71.4025 
71.5716 


71.7409 
71.9104 
72.0801 


72.2500 2.91548 


TABLE II (continued) 


\V/10N 


8.94427 


8.94986 
8.95545 
8.96103 


8.96660 
8.97218 
8.97775 


8.98332 
8.98888 
8.99444 


9.00000 


9.00555 
9.01110 
9.01665 


9.02219 
9.02774 
9.03327 


9.03881 
9.04434 
9.04986 


9.05539 


9.06091 
9.06642 
9.07193 


9.07744 
9.08295 
9.08845 


9.09395 
9.09945 
9.10494 


9.11045 


9.11592 
9.12140 
9.12688 


9.13236 
9.13783 
9.14330 


9.14877 
9.15423 
9.15969 


9.16515 


9.17061 
9.17606 
9.18150 


9.18695 
9.19239 
9.19783 


9.20326 
9.20869 
9.21412 


9.21954 


= 


2c 


72.2500 2.91548 


72.4201 
72.5904 
72.7609 


2.91719 
2.91890 
2.92062 


2.92233 
2.92404 
2.92575 


2.92746 
2.92916 
2.93087 


72.9316 
73.1025 


73.2736 


73.4449 
73.6164 
73.7881 


73.9600 2.93258 


74.132] 
74.3044 
74.4769 


74.6496 
74.8225 
74.9956 


75.1689 
75.3424 
75.5161 


2.93428 
2.93598 
2.93769 


2.93939 
2.94109 
2.94279 


2.94449 
2.94618 
2.94788 


75.6900 2.94958 


75.8641 
76.0384 
76.2129 


76.3876 
76.5625 
76.7376 


76.9129 
77.0884 
77.264) 


2.95127 
2.95296 
2.95466 


2.95635 
2.95804 
2.95973 


2.96142 
2.96311 
2.96479 


77.4400 2.96648 


77.6161 
77.7924 
77.9689 


78.1456 
78.3225 
78.4996 


78.6769 
78.8544 
79.0321 


2.96816 
2.96985 
2.97153 


2.97321 
2.97489 
2.97658 


2.97825 
2.97993 
2.98161 


79.2100 2.98329 


79.3881 
79.5664 
79.7449 


79.9236 
80.1025 


2.98496 
2.98664 
2.98831 


2.98998 
2.99166 
2.99333 


2.99500 
2.99666 
2.99833 


80.2816 


80.4609 
80.6404 
80.8201 


81.0000 3.00000 


9.21954 


9.22497 
9.23038 
9.23580 


9.24121 
9.24662 
9.25203 


9.25743 
9.26283 
9.26823 


9.27362 


9.27901 
9.28440 
9.28978 


9.29516 
9.30054 
9.30591 


9.31128 
9.31665 
9.32202 


9.32738 


9.33274 
9.33809 
9.34345 


9.34880 
9.35414 
9.35949 


9.36483 
9.37017 
9.37550 


9.38083 


9.38616 
9.39149 
9.39681 


9.40213 
9.40744 
9.41276 


9.41807 
9.42338 
9.42868 


9.43398 


9.43928 
9.44458 
9.44987 


9.45516 
9.46044 
9.46573 


9.47101 
9.47629 
9.48156 


9.48683 


9.00 81.0000 3.00000 


9.01 81.1801 
9.02 81.3604 
9.03 81.5409 


9.04 81.7216 
9.05 81.9025 
9.06 82.0836 


9.07 82.2649 
9.08 82.4464 
9.09 82.6281 


9.10 82.8100 3.01662 


9.11 82.992] 
9.12 83.1744 
3 83.3569 


83.5396 
83.7225 
83.9056 


84.0889 
84.2724 
84.4561 


ee — hp 
WON DANA 


o 


84.824] 
85.0084 
85.1929 


85.3776 
85.5625 
85.7476 


85.9329 
86.1184 
86.3041 


WOMAN DANK Whe 


o 


86.6761 
86.8624 
87.0489 


87.2356 
87.4225 
87.6096 


87.7969 
87.9844 
88.1721 
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fo) 


88.5481 
88.7364 
88.9249 


89.1136 


89.3025 
89.4916 


89 6809 
89 8704 
90.0601 


ba BAoaa Haan] a]WWW Www 


HOD WOOD WOWWlWlOWH WOW WOW |H [OOH DWH OWwW!|H [WOH Www wo 
CON Amnfh Whore 


ww} o 
mn] 
Oo] 0 


84.6400 3.03315 


86.4900 3.04959 


88.3600 3.06594 


90.2500 3.08221 


TABLE II 


VION. 


9.48683 


9.49210 
9.49737 
9.50263 


9.50789 
9.51315 
9.51840 


9.52365 
9.52890 
9.53415 


9.53939 


9.54463 
9.54987 
9.55510 


9.56033 
9.56556 
9.57079 


9.57601 
9.58123 
9.58645 


9.59166 


9.59687 
9.60208 
9.60729 


9.61249 
9.61769 
9.62289 


9.62808 
9.63328 
9.63846 


9.64365 


9 64883 
9.65401 
9.65919 


9.66437 
9.66954 
9.6747} 


9.67988 
9.68504 
9.69020 


-9,69536 


9.70052 
9.70567 
9.71082 


9.71597 
9.72111 
9.72625 


9.73139 
9.73653 
9.74166 


9.74679 
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78 
9.50 90.2500 3.08221 


9.51 
9.52 
9.53 


9.54 
9.55 
9.56 


9.57 
9.58 
9.59 


9.60 


9.61 
9.62 
9.63 


9.64 
9.65 
9.66 


9.67 
9.68 
9.69 


9.70 


9.71 
9.72 
9.73 


9.74 
9.75 
9.76 


9.77 
9.78 
9.79 


9.80 


9.81 
3.82 
9.83 


9.84 
9.85 
9 86 


9.87 
9.88 
9.89 


9.90 


9:91 
9:92 
9.93 


9.94 
9.95 
9.96 


9.97 
9.98 
9.99 


10.0 


90.4401 3.08383 
90.6304 3.08545 


90.8209 3.08707 
91.0116 3.08869 


91.2025 3.09031 
91.3936 3.09192 


91.5849 3.09354 
91.7764 3.09516 
91.9681 3.09677 


92.3521 3.10000 
92.5444 3.10161 
92.7369 3.10322 


92.9296 3.10483 
93.1225 3.10644 
93.3156 3.10805 


93.5089 3.10966 
93.7024 3.11127 
93.8961 3.11288 


94.0900 3.11448 


94.2841 3.11609 
94.4784 3.11769 
94.6729 3.11929 


94.8676 3.12090 
95.0625 3.12250 
95.2576 3.12410 


95.4529 3.12570 
95.6484 3.12730 
95.8441 3.12890 


96:0400 3.13050 


96.2361 3.13209 
96.4324 3.13369 
96.6289 3.13528 


96.8256 3.13688 
97.0225 3.13847 
97.2196 3.14006 


97.4169 3.14166 
97.6144 3.14325 
97.8121 3.14484 


98.0100 3.14643 


98.208 1 3.14802 
98 4064 3.14960 
98.6049 SAS1L19 


98.8036 3.15278 
99.0025 3.15436 
3.15595 


99.4009 315753 
99.6004 3.15911 
99.8001 3.16070 


99.2016 


100.000 3.16228 
st | wl. 


9.74679 


9.75192 
9.75705 
9.76217 


9.76729 
9.7724) 
9.77753 


9.78264 
9.78775 
9.79285 


9.79796 


9.80306 
9.80816 
9.81326 


9.81835 
9.82344 
9.82853 


9.83362 
9.83870 
9.84378 


9.84886 


9.85393 
9.85901 
9 86408 


9.86914 
9.87421 
9.87927 


9.88433 
9.88939 
9.89444 


9.89949 


9.90454 
9.90959 
9.91464 


9.91968 
9.92472 
9.92974 


9.93479 
9.93982 
9.94485 


9.94987 


9.95490 
9.95992 
9.96494 


9.96995 
9.97497 
9.97998 


9.98499 
9.98999 
9.99500 


10.0000 


265 


266 
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TABLE III 


0.2 


0.640 
0.320 
0.040 


0.512 
0.384 
0.096 
0.008 


0.410 
0.410 
0.154 
0.026 
0.002 


0.328 
0.410 
0.205 
0.051 
0.006 


0.262 
0.393 
0.246 
0.082 
0.015 
0.002 


0.210 
0.367 
0.275 
0.115 
0.029 
0.004 


0.168 
0.336 
0.294 
0.147 
0.046 
0.009 
0.001 


0.25 


0.563 
0.375 
0.063 


0.422 
0.422 
0.141 
0.016 


0.316 
0.422 
0.211 
0.047 
0.004 


0.237 
0.396 
0.264 
0.088 
0.015 
0.001 


0.178 
0.356 
0.297 
0.132 
0.033 
0.004 


0.134 
0.312 
0.312 
0.173 
0.058 
0.012 
0.001 


0.100 
0.267 
0.312 
0.208 
0.087 
0.023 
0.004 


BINOMIAL PROBABILITIES 


0.3 


0.490 
0.420 
0.090 


0.343 
0.44] 
0.189 
0.027 


0.240 
0.412 
0.265 
0.076 
0.008 


0.168 
0.360 
0.309 
0.132 
0.028 
0.002 


0.118 
0.303 
0.324 
0.185 
0.060 
0.010 
0.001 


0.082 
0.247 
0.318 
0.227 
0.097 
0.025 
0.004 


0.058 
0.198 
0.296 
0.254 
0.136 
0.047 
0.010 
0.001 


0.4 


0.360 
0.480 
0.160 


0.216 
0.432 
0.288 
0.064 


0.130 
0.346 
0.346 
0.154 
0.026 


0.078 
0.259 
0.346 
0.230 
0.077 
0.010 


0.047 
0.187 
0.31} 
0.276 
0.138 
0.037 
0.004 


0.028 
0.131 
0.261 
0.290 
0.194 
0.077 
0.017 
0.002 


0.017 
0.090 
0.209 
0.279 
0.232 
0.124 
0.041 

0.008 
0.001 


0.6 


0.160 
0.480 
0.360 


0.064 
0.288 
0.432 
0.216 


0.026 
0.154 
0.346 
0.346 
0.130 


0.010 
0.007 
0.230 
0.346 
0.259 
0.078 


0.004 
0.037 
0.138 
0.276 
0.311] 
0.187 
0.047 


0.002 
0.017 
0.077 
0.194 
0.290 
0.261 
0.131 
0.028 


0.001 
0.008 
0.041 
0.124 
0.232 
0.279 
0.209 
0.090 
0.017 


0.7 0.75 


0.090 0.063 
0.420 0.375 
0.490 0.563 


0.027 0.016 
0.189 0.141 
0.441 0.422 
0.343 0.422 


0.008 0.004 
0.076 0.047 
0.265 0.211 
0.412 0.422 
0.240 0.316 


0.002 0.001 
0.028 0.015 
0.132 0.088 
0.309 0.274 
0.360 0.396 
0.168 0.237 


0.001 

0.010 0.004 
0.060 0.033 
0.185 0.132 
0.324 0.297 
0.303 0.356 
0.118 0.178 


0.004 0.001 
0.025 0.012 
0.097 0.058 
0.227 0.173 
0.318 0.312 
0.247 0.312 
0.082 0.134 


0.001 

0.010 0.004 
0.047 0.023 
0.136 0.087 
0.254 0.208 
0.296 0.312 
0.198 0.267 
0.058 0.100 


0.8 


0.040 
0.320 
0.640 


0.008 
0.096 
0.384 
0.512 


0.002 
0.026 
0.154 
0.410 
0.410 


0.006 
0.051 
0.205 
0.410 
0.328 


0.002 
0.015 
0.082 
0.246 
0.393 
0.262 


0.004 
0.029 
0.115 
0.275 
0.367 
0.210 


0.001 
0.009 


0.046 
0.147 


0.294 


0.336 
0.168 


0.9 


0.010 
0.180 
0.810 


0.001 
0.027 
0.243 
0.729 


0.004 
0.049 
0.292 
0.656 


0.008 
0.073 
0.328 
0.590 


0.001 
0.015 
0.098 
0.354 
0.531 


0.003 
0.023 
0.124 
0.372 
0.478 


0.005 
0.033 
0.149 
0.383 
0.430 


0.95 


0.002 
0.095 
0.902 


0.007 
0.135 
0.857 


0.014 
0.171 
0.815 


0.001 
0.021 
0.204 


0.774 


0.002 
0.031 
0.232 
0.735 


0.004 
0.041 
0.257 
0.698 


0.005 
0.051 
0.279 
0.663 


BhWwn—-ocounmYA uve wn-o O~orANA NWN — CO 


SOMIRDAWARWN—O DOwwmrn.Aay 


— po 
nN =— 


0.599 
0.315 
0.075 
0.010 
0.001 


0.569 
0.329 
0.087 
0.014 
0.001 


0.540 
0.341 
0.099 
0.017 
0.002 


TABLE HI BINOMIAL PROBABILITIES (continued) 


P 
0.1 0.2 0.25 0.3 0.4 0.5 0.6 0.7 0.75 08 


0.387 0.134 0.075 0.040 0.010 0.002 
0.387 0.302 0.225 0.156 0.060 0.018 0.004 
0.172 0.302 0.300 0.267 0.161 0.070 0.021 0.004 0.001 
0.045 0.176 0.234 0.267 0.251 0.164 0.074 0.021 0.009 0.003 
0.007 0.066 0.117 0.172 0.251 0.246 0.167 0.074 0.039 0.017 
0.001 0.017 0.039 0.074 0.167 0.246 0.251 0.172 0.117 0.066 
0.003 0.009 0.021 0.074 0.164 0.251 0.267 0.234 0.176 
0.001 0.004 0.021 0070 0.161 0.267 0.300 0.302 
0.004 0.018 0.060 0.156 0.225 0.302 
0.002 0.010 0.040 0.075 0.134 


0.349 0.107 0.056 0.028 0.006 0.001 
0.387 0.268 0.188 0.121 0.040 0.010 0.002 
0.194 0.302 0.282 0.233 0.121 0.044 0.011 0.001 
0.057 0.201 0.250 0.267 0.215 0.117 0.042 0.009 0.003 0.001 
0.011 0.088 0.146 0.200 0.251 0.205 0.111 0.037 0.016 0.006 
0.001 0.026 0.058 0.103 0.201 0.246 0.201 0.103 0.058 0.026 
0.006 0.016 0.037 0.111 0.205 0.251 0.200 0.146 0.088 
0.001 0.003 0.009 0.042 0.117 0.215 0.267 0.250 0.201 
0.001 0.011 0.044 0.121 0.233 0.282 0.302 
0.002 0.010 0.040 0.121 0.188 0.268 
0.001 0.006 0.028 0.056 0.107 
0.314 0.086 0.042 0.020 0.004 
0.384 0.236 0.155 0.093 0.027 0.005 0.001 
0.213 0.295 0.258 0.200 0.089 0.027 0.005 0.001 
0.071 0.221 0.258 0.257 0.177 0.081 0.023 0.004 0.001 
0.016 O.111 0.172 0.220 0.236 0.161 0.070 0.017 0.006 0.002 


~ 0.002 0.039 0.080 0.132 0.221 0.226 0.147 0.057 0.027 0.010 


0.010 0.027 0.057 0.147 0.226 0.221 0.132 0.080 0.039 
0.002 0.006 0017 0.070 0.161 0.236 0.220 0.172 O.111 
0.001 0.004 0.023 0.081 0.177 0.257 0.258 0.221 

0.001 0.005 0.027 0.089 0.200 0.258 0.295 

0.001 0.005 0.027 0.093 0.155 0.236 

0.004 0.020 0.042 0.086 


0.282 0.069 0.032 0.014 0.002 
0.377 0.206 0.127 0.071 0.017 0.003 
0.230 0.283 0.232 0.168 0.064 0.016 0.002 
0.085 0.236 0.258 0.240 0.142 0.054 0.012 0.001 
0.021 0.133 0.194 0.231 0.213 0.121 0.042 0.008 0.002 0.001 
0.004 0.053 0.103 0.158 0.227 0.193 0.101 0.029 0.012 0.003 
0.016 0.040 0.079 0.177 0.226 0.177 0.079 0.040 0.016 
0.003 0.012 0.029 0.101 0.193 0.227 0.158 0.103 0.053 
0.001 0.002 0.008 0.042 0.121 0.213 0.231 0.194 0.133 
0.001 0.012 0.054 0.142 0.240 0.258 0.236 
0.002 0.016 0.064 0.168 0.232 0.283 
0.003 0.017 0.071 0.127 0.206 
0.002 0.014 0.032 0.069 


0.9 


0.001 
0.007 
0.045 
0.172 
0.387 
0.387 


0.001 
0.011 
0.057 
0.194 
0.387 
0.349 


0.002 
0.016 
0.071 
0.213 
0.384 
0.314 


0.004 
0.021 
0.085 
0.230 
0.377 
0.282 


0.95 


0.00! 
0.008 
0.063 
0.299 
0.630 


0.001 
0.014 
0.087 
0.329 
0.569 


0.002 
0.017 
0.099 
0.341 
0.540 
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0.05 


0.513 
0.35] 
0.111 
0.021 
0.003 


0.488 
0.359 
0.123 
0.026 
0.004 


0.463 
0.366 
0.135 
0.031 
0.005 
0.001 


TABLE II 


0.1 


0.254 
0.367 
0.245 
0.100 
0.028 
0.006 
0.001 


0.229 
0.356 
0.257 
0.114 
0.035 
0.008 
0.001 


0.206 
0.343 
0.267 
0.129 
0.043 
0.010 
0.002 


0.2 


0.055 
0.179 
0.268 
0.246 
0.154 
0.069 
0.023 
0.006 
0.001 


0.044 
0.154 
0.250 
0.250 
0.172 
0.086 
0.032 
0.009 
0.002 


0.035 
0.132 
0.231 
0.250 
0.188 
0.103 
0.043 
0.014 
0.003 

0.001 


0.25 


0.024 
0.103 
0.206 
0.252 
0.210 
0.126 
0.056 
0.019 
0.005 
0.001 


0.018 
0.083 
0.180 
0.240 
0.220 
0.147 
0.073 
0.028 
0.008 
0.002 


0.013 
0.067 
0.156 
0.225 
0.225 
0.165 
0.092 
0.039 
0.013 
0.003 
0.001 


0.3 


0.010 
0.054 
0.139 
0.218 
0.234 
0.180 
0.103 
0.044 
0.014 
0.003 
0.001 


0.007 
0.041 
0.113 
0.194 
0.229 
0.196 
0.126 
0.062 
0.023 
0.007 
0.001 


0.005 
0.031 
0.092 
0.170 
0.219 
0.206 
0.147 
0.081 

0.035 
0.012 
0.003 
0.001 


0.001 
0.011 
0.045 
0.111 
0.184 
0.221 
0.197 
0.131 
0.066 
0.024 
0.006 
0.001 


0.001 
0.007 
0.032 
0.085 
0.155 
0.207 
0.207 
0.157 
0.092 
0.04 | 
0.014 
0.003 
0.001 


0.005 
0.002 
0.063 
0.127 
0.186 
0.207 
0.177 
0.118 
0.061 
0.024 
0.007 
0.002 


0.6 


0.001 
0.006 
0.024 
0.066 
0.131 
0.197 
0.22] 
0.184 
0.111 
0.045 
0.011 
0.001 


0.001 
0.003 
0.014 
0.041 
0.092 
0.157 
0.207 
0.207 
0.155 
0.085 
0.032 
0.007 
0.001 


0.002 
0.007 
0.024 
0.061 
0.118 
0.177 
0.207 
0.186 
0.127 
0.063 
0.022 
0.005 


0.7 


0.001 
0.003 
0.014 
0.044 
0.103 
0.180 
0.234 
0.218 
0.139 
0.054 
0.010 


0.001 
0.007 
0.023 
0.062 
0.126 
0.196 
0.229 
0.194 
0.113 
0.041 
0.007 


0.001 
0.003 
0.012 
0.035 
0.08 | 
0.147 
0.206 
0.219 
0.170 
0.092 
0.031 
0.005 


0.75 


0.005 
0.019 
0.056 
0.126 
0.210 
0.252 
0.206 
0.103 
0.024 


0.002 
0.008 
0.028 
0.073 
0.147 
0.220 
0.240 
0.180 
0.083 
0.018 


0.001 
0.003 
0.013 
0.039 
0.092 
0.165 
0.225 
0.225 
0.156 
0.067 
0.013 


BINOMIAL PROBABILITIES (continued) 


0.8 


0.001 
0.006 
0.023 
0.069 
0.154 
0.246 
0.268 
0.179 
0.055 


0.002 
0.009 
0.032 
0.086 
0.172 
0.250 
0.250 
0.154 
0.044 


0.001 
0.003 
0.014 
0.043 
0.103 
0.188 
0.250 
0.231 
0.132 
0.035 


0.9 


0.001 
0.006 
0.028 
0.100 
0.245 
0.367 
0.254 


0.001 
0.008 
0.035 


0.114 


0.257 
0.356 
0.229 


0.002 
0.010 
0.043 
0.129 
0.267 
0.343 
0.206 


0.95 


0.003 
0.021 
0.111 
0.351 
0.513 


0.004 
0.026 
0.123 
0.359 
0.488 


0.001 
0.005 
0.031 
0.135 
0.366 
0.463 


TABLE IV) AREAS UNDER THE NORMAL CURVE 


An entry in the table is the proportion under the 
entire curve which is between z = O and a positive 
value of z. Areas for negative values of z are ob- 
tained by symmetry. 

Second decimal place of 2 


From Paul G. Hoel, Elementary Statistics, 3rd ed., © 1971, John Wiley and Sons, Inc., New York, 
p. 287. 
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TABLE V_ CRITICAL POINTS OF THE + DISTRIBUTION 


The first column lists the number of degrees of 
freedom (v). The headings of the other columns 
give probabilities (P) for t to exceed the entry value. 


Use symmetry for negative t values. 0 t 

1 63.657 
2 9.925 
3 5.841 
4 4.604 
5 4.032 
6 3.707 
7 3.499 
8 3.355 
9 3.250 
10 3.169 
1] 3.106 
12 3.055 
13 3.012 
14 2.977 
‘a 15 2.947 
16 2.921 
17 2.898 
18 2.878 
19 2.861 
20 2.845 
21 2.831 
22 2.819 
23 2.807 
24 2.797 
25 2.787 
26 2.779 
27 2.771 
28 2.763 
29 2.756 
30 2.750 
40 2.704 
60 2.660 
120 2.617 
00 2.576 


From Paul G. Hoel, Elementary Statistics, 3rd ed., © 1971, John Wiley and Sons, Inc., New York, 
p. 288. 
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TABLE VII CRITICAL VALUES OF +r FOR TESTING p = 0 


For a two-sided test a@ 1s twice the value listed at the 
heading of a column of critical r values; hence for 
a = 0.05 choose the 0.025 column. a 


008 | 0025 | e010 XN 
17 


18 
19 
20 
25 
30 
40 
SO 
60 
80 
100 


Tables VI and Vii are from Paul G. Hoel, Elementary Statistics, 3rd ed., © 1971, John Wiley and 
Sons, Inc., New York, pp. 289, 292-294. 
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TABLE VIII CRITICAL POINTS OF THE ,’ DISTRIBUTION 


The first column lists the number of degrees of free- 
dom. The headings of the other columns give probabi- 


lities (P) for y* to exceed the entry value. x? 
df 0.005 
| 3.84146 5.02389 6.63490 7.87944 
2 5.99147 7.37776 9.21034] 10.5966 
3 7.81473 9.34840 11.3449 12.8381 
4 9.48773} 11.1433 13.2767 14.8602 
5 11.0705 12.8325 15.0863 16.7496 
6 12.5916 14.4494 16.8119 18.5476 
7 14.0671 16.0128 18.4753 20.2777 
8 15.5073 17.5346 20.0902 21.9550 
9 16.9190 19.0228 21.6660 23.5893 
10 18.3070 20.483] 23.2093 25.1882 
ct 19.6751 21.9200 24.7250 26.7569 
12] 21.0261 23.3367 26.2170 28.2995 
13! 22.3621 24.7356 27.6883 29.8194 
4 | 23.6848 | 26.1190 | 29.1413 | 31.3193 
15! 24.9958 27.4884 30.5779 32.8013 
16 | 26.2962 28.8454 31.9999 34.2672 
17 27.5871 30.1910 33.4087 35.7185 
18 28.8693 31.5264 34.8053 37.1564 
19 30.1435 32.8523 36.1908 38.5822 
20 31.4104 34.1696 37.5662 39.9968 
21 32.6705 35.4789 38.9321 41.4010 
22) 33.9244 36.7807 40.2894 42.7956 
23 | 35.1725 38.0757 41.6384 44.1813 
24! 36.4151 39.3641 42.9798 45.5585 
25 | 37.6525 40.6465 44.3141 46.9278 
26 38.8852 41.9232 45.6417 48.2899 
27! 40.1133 43.1944 46.9630 49.6449 
28 41.3372 44.4607 48.2782 50.9933 
29 42.5569 45.7222 49.5879 52.3356 
30 43.7729 46.9792 50.8922 53.6720 
4O 55.7585 59.3417 63.6907 66.7659 
50 67.5048 71.4202 76.1539 79.4900 
60 | 79.0819 83.2976 88.3794 91.9517 
70 | 90.5312 95.0231 | 100.425 104.215 
80 | 101.879 106.629 112.329 116.321 
90 | 113.145 118.136 124.116 128.299 
100 | 124.342 129.561 135.807 140.169 


Test 


This test will take about an hour to complete. It includes material from all 
chapters. Make sure you have done all the review problems for each chapter 
before you take it. 


Se SON ae 


10. 


11. 


Draw a frequency distribution for the following data: 


Ly; le 3h AS As 6,2, 25 257 Bo Te Ty 2, 23. Se 4y 35. 12, 95.0;-3,, 63 10; 5, 2, 
33 Wil fan 2 2: 


Describe briefly the shape of the frequency distribution in problem 1. 
What is the median of the data 1n problem 1” 

What is the mode of the data in problem |? 

What is the mean of the data in problem 1” 


The data in problem | represent a sample drawn from a larger population. 
What is your best estimate of the population standard deviation? 

In a certain city 10°% of the residents are over 60 years old. What is the proba- 
bility of finding no one over 60 in a sample of 10 residents” 

Establish a 95°” confidence interval for « on the basis of the following sample 
Statistics: 


n = 100 
x = 10 
ee 


You are working with samples of size 15 from populations that are approxi- 
mately normally distributed. What table would you use to establish confidence 
intervals” 

You suspect that two brands of soft drink differ in acidity. You measure the 
acidity of 10 randomly selected bottles of each brand with the following results: 


Brand A Brand B 
n= 10 n= 10 

xX = 6.50 xX = 6.30 
s = 0.05 s = 0.04 


Outline an appropriate statistical test. What assumptions must you make, if 
any? Is the difference between the two brands significant at the I°¢ level? 
‘The power of this test against the alternative uw = 5 1s 0.80.” Explain this 
statement. 
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12. 


13. 


14. 


15. 


16. 


17. 


18. 


The following data represent growth rates of plants before and after a chemical 
treatment. Outline an appropriate statistical test of the theory that the chemical 
treatment reduces growth rate. What assumptions must you make, if any? Are 
the results significant at the 5°% level? 


Plant Before After 


NYA NN PRWN — 
Nn Bh Nh AQ — 
— On NN WwW WD 


5 

Compute the correlation coefficient between the before and after scores in 
problem 12. 

Use a regression equation with the data of problem 12 to predict the “‘after”’ 
growth rate of a plant with a ‘“‘before”’ growth rate of 3. 

On the basis of sample data you wish to determine whether the variances of 
two populations differ. What statistical test will you use? Write the formula. 


You wish to test the theory that men have higher self-esteem than women. On 
the basis of a questionnaire you classify a group of men and women into 
categories of high, medium, and low self-esteem. The results are as follows: 


Perform an appropriate Statistical test and comment on the results. 


High Medium Low 
Men 10 30 10 
Women 25 15 lO 


Complete the analysis of variance below. Are there significant differences 
between groups? (a = 0.01) 


Sum of squares df Variance estimate F 
Total | 580 29 
Between groups 175 Z 
Within groups 405 24 


What assumptions are required for a two-way analysis of variance? 


—_ 
= 


11. 
12. 


13. 


SPARK RwWh 
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Answers 


1-2 3- 4 5- 6 7-8 9 - 10 11-12 13-14 15-16 17-18 


Skewed to the right 

4 

2 

5.6 

4.47 

0.349, or 35% of the time 

wis between 9.02 and 10.98 with 95% confidence. 
“Critical values of 7” 


Null hypothesis wu, = uw, 

Alternative a, # fy 

Significance level a = 0.01 

Critical region tf < —2.78 ort = +2.878 


Assumptions: approximately normal distributions for both brands and ap- 
proximately equal variances 


t = 9.901; the difference is significant. 
If « = 5, you have 80% probability of obtaining araideaut results. 
Use difference scores 


Null hypothesis wu = 0 
Alternative u<O 
Significance level a = 0.05 
Critical region tr < —1,943 


Assumptions: the population of difference scores is approximately normally 
distributed. 


t = QO; the results are not significant. 
r = 0.22 
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b = 0.29: the predicted “‘after”’ score is 3.31. 

Pak ie 

Use aX’ test. X?= 11.34, which is significant at the 1°%% level. This result indicates 
that the distribution obtained was unlikely to have occurred by chance if there 
were no difference between men and women, but the results do not support the 
theory that men have higher self-esteem than women. 


Variance estimate F 
Total 
Between groups 87.5 5.83 
Within groups 15.0 


The differences between groups are significant. 
Assumptions: all observations are independent; all groups have approximately 
normal distributions and approximately equal variances. 
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Degrees of freedom, with analysis of variance, 166 
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with ¢, 91 
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Difference scores, 126 
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Error variance, 161 
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Experimental design, 110, 115 


F distribution, critical points, 157 
F ratio, 156 
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of a sample, 40 
of a statistic, 42 


Hypothesis, alternative, 103 
null, 103 
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difference between means, 135 
difference scores, 126 
means, 112 
P, 104 
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Independent, 60, 72, 135, 150 
Interaction, 236, 243 


Linear, 187 


Marginals, 223 
Mean, 24 
confidence intervals for, 82, 86, 88 
estimate of, 81 
hypothesis testing, 112 
sampling distribution of, 62 
Means, difference between, 135, 161 
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Mode, 14, 23 
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Normal distribution, 14, 62 
Null hypothesis, 103 


One-tailed test, 142, 151 


p, 43 
confidence intervals for, 97 
estimate of, 81 
hypothesis testing, 104 
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sampling distribution of, 55 


Parameter, 41 
Population, 38 
Power, 118, 230 
Prediction, 204 


r, 194 

critical values of, 200 
Random, 60, 72 
Range, 26 
Regression, 204 
Replacement, 60 
Residual, 245 


s, 77 
Sample, 38 
Sample size, 90, 120, 227 
Sampling distribution, 42 
Scattergram, 183, 204 
Significant results, 103, 115 
Skewed, 14 
Standard deviation, 26 
estimate of, 77 
of population, 77 
and power, 120 
of sample, 77 


of sampling distribution, 64 
see also Variance 

Statistic, 41 

Success, 56 

Summation sign, 24, 33 

Sum of squares, 167, 170, 241 


t distribution, 90 
critical points, 91 

Test, choice of, 141, 176, 200, 230, 246 

t test, for difference between means, 139, 177 
for difference scores, 130 

Two-tailed test, 142, 151, 160 
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